Introduction
Lab book for analyses using hierachal computational modelling to identify parameters that define the best model of learning as it applies to fear conditioning acquisition and extinction using FLARe fear conditioning data. Long abstract, justification and analysis plan found in prelim manuscript here
In short:
Aims
Identify model of learning based on a priori hypotheses that best fits the trajectories of fear relevant learning in our FLARe dataset
- Use all first week data from Validation, app TRT, lab TRT, Pilot, Headphones (n = 223 after exclusions)
- Include Acquisition, extinction (trajectories representing fear learning and treatment)
- Identify parameters that define these trajectories
- e.g. Learnign rate, plateau, first ambiguous trial etc.
Cross validate best fitting model in TEDS data
Are these parameters associated with other emasures of indsividual differences in our datasets?
- Personality (Neuroticism)
- Current anxiety symptoms (GAD-7) - equivalent of baseline symptoms (Chris + Meg analyses)
- Lifetime / trait anxiety (STAI / ASI - FLARe analyses)
- Current depression symptoms (PHQ-9) - equivalent of baseline symptoms (Chris + Meg analyses)
- Interpretation biases (IUS, ASSIQ - FLARe analyses)
- SES (Meg IAPT: benefits, employment etc)
- Gender (Meg analyses)
- Emotion regulation profile (potentially LCA based?)
Impact and relevance
Evidence from both human (Richter et al., 2012) and rodent (Galatzer-Levy, Bonanno, Bush, & LeDoux, 2013) studies suggest that trajectories of how we learn and extinguish fear differ between individuals. Different trajectories of fear and extinction have also been found using fear conditioning studies (e.g. Duits et al., 2016), a good model for the learning of, and treatment for, fear and anxiety disorders. It is likely that these trajectories of fear extinction might predict outcomes in exposure-based cognitive behavioural therapy (Kindt, 2014).
Identifying parameters that predict individual trajectories of fear learning and extinction will enable us to harness fear conditioning data more effectively to aid in understanding mechanisms underlying the development of and treatment for anxiety disorders. With more accurate models of these processes, the potential to use fear conditioning paradigms to predict those most at risk of developing an anxiety disorder, and those who might respond best to exposure-based treatments, greatly improves.
Analysis plan
Define set of a priori models moving from simple to more complex
- Some parameters to include:
- Rate of learning (sometimes with punishment reinforcement)
- Sensitivity to punishment
- Pre-existing anxiety
- SES? Gender?
Run each model and compare fit in FLARe pre TEDS data
- Use Log likelihood and BIC etc.
Select best fitting model
Extract individual data for learning parameters from this model and see what factors best predict it
- Anxiety (if anxiety isnt best as part of the model)
- Interpretation biases
- Tolerance of uncertanty
- Cognitive emotional control
- emotional attentional control
- SES?
- Gender?
Run all models again in FLARe TEDS
- Decide if the same model best fits the data again.
- See if we get similar results from the parameter prediction
Will use a combination of R.Version(3.5.1), RStan (Version 2.18.2, GitRev: 2e1f913d3ca3) and hBayesDM package in R (3.5.1) Ahn, W.-Y., Haines, N., & Zhang, L. (2017). Revealing neuro-computational mechanisms of reinforcement learning and decision-making with the hBayesDM package. Computational Psychiatry, 1, 24-57., which uses RStan
Modelling notes
Intuition
Discussion with Vince Valton and Alex Pike about the best way to fit this model. As the observed outcomes (expectancy ratings) are non binary and are related to eachother (i.e. as you become more likely to select 9, you become less likely to select 1) we should consider each trial for each person for each stimulus as a constantly updating beta distribution. so you might see a pattern like this for the CS+ in acq for example.
So, best model is likely to be one using beta distributions that show the probability distribution for each rating.
We can use sufficient parameters to describe these (i.e. mean / sd or possibly the mode)
A useful intuition of the beta distribtion can be found here
and a useful website here
scaling
We can scale the beta by how aversive participants find the shock. i.e. it might update their learning as if there was .5 a shock or 1.5 of a shock depending on their own sensitivity to the aversiveness / punishment.
alpha
generalisation
We can do this with a single beta distribution for each phase (collapsing over the two stimuli). This would be akin to a per phase generalisation parameter in that it will be smaller if they tend to choose the same expectancy for both stimuli and larger if they tend to choose very differently for both stimuli.
However, because these variables are not really equivalent (i.e the reinforcement rate is different for both, and we use this in the model)
So instead we can create a paramater which is the value of cs- weighted by some value of the cs+. How much each individual weights by the Cs+ can be freely estimated by the model and can be the generalisation parameter.
So this would be vminus = vminus + (w)vplus (where the w parameter is the freely estimated parameter per person)
per stimulus We probably want to model cs+ and cs- separately too - so have a beta distribution characterised by sufficient parameters for each.
per trial
All of the above can then also be done with updating per trial.
leaky beta
we also need a model that incorporates ‘leak’. i.e. learning leak - likely that participants will update more based on more recent trials and learn less from the more distant trials as time progresses. See Toby’s paper for more.
uncertainty
We should consider incorportating a parameter that maps to participant uncertainty about outcomes.
anxiety
Might be worth incorporating this as a model paramater / feature. Read this for more.
Hypotheses About the Relationship of Cognition With Psychopathology Should be Tested by Embedding Them Into Empirical Priors (Moutoussist et al., 2018)
Log likelihood notes
As we are using a beta distribution, we will calculate log likelihood based on the probability function for the distribution (i.e. where will the peak of the shape be) given the participants response at each trial. So will add the probability density function given each trial response trial by trial for each of the CS+ and - summed together.
Will obtain 1 log likelihood and then 1 per trial and add together to make sure that these are comparable.
the basic stan terminology for this is below:
beta_lpdf(rating[t,p]|shape1[t,p],shape2[t,p])
where beta_lpdf is the probability density given the rating made and each of the two beta distribution shape parameters that we estimate.
This is what we will use to compare models.
Terminology
V == ‘value’. Baasically a parameter that is about the salience of the stimulus at any given point.
alpha == ‘learning rate’. A parameter that describes how sensitive people are to updating their learning. So a fast learning rate means that learning on any given trial is weighted more based on the trials immediatly preceding than past ones, and a slow learning rate means that all past events influence learning more evenly. Alex’s tennis analogy is good here (Federer - stable player, can predict a win based on all matches; Murray - volatile player; his last match is best predictor of next match performance). beta == ‘confidence’. This is sort of an error term - how much variance in rating choice is there for each person/trial. Can be thought of as the variance, or beta^2 as the sd.
Can be confusing as we are using beta distributions (different thing) which has two sufficient parameters a + b).
Beta distribution visualisation
and how they change depending on whether you change the beta or alpha parameters.
Here are some simulations I can change and play with the illustrate the same sort of thing.
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95
[21] 1.00
[1] "stable beta, increasing alpha"





[1] "stable alpha, increasing beta"





Models to write / run
Will probably do all per trial. Will do an early sensitivity check to confirm this.
- Single beta, no scaling
- Single beta, no scaling per trial.
*** At this point, compare the two above. Ensure the per trial fits better, and if it does then do all below per trial***
- "" scaled
- Single beta Single alpha reinforcement learning model (estimate both the beta and the alpha i.e. learning rate)
- Single beta single alpha reinforcement learning with mean + sd for the beta estimate as a parameter
- Beta per stimulus
- Beta per stimulus + generalisation parameter (Vminus = vminus + wvplus)
- Leaky beta
- Leaky beta + uncertainty
- Leaky beta + uncertainty + anxiety
Justification of model components
Alpha Learning rate parameter. If high then will be very influenced by previous trial events, if low, then will be more standardly influenced by accumulating events.
- Single alpha per person
- Assumes that learning rate is a constant for each individual that might be scaled by other factors, such as certainty or sensitivity.
Betas Variance/certainty parameter
- Single beta per person
- Assumes that the general variance around ratings is the same regardless of stimulus. i.e. as much uncertainty for CS+ asn CS-
- Two beta’s per person
- Assumes that confidence / uncertainty might differ by stimulus. Presumably as a factor of reinforcement rate.
Preliminary
Compare a priori to data
Simulate different learning rates
only doing this ‘accurately’ for the acquisition CS+, as the simulations require probability. I am using contingency for this (0.75). If set for 0 for all other phases and stimuli then it looks as if the learning should be flat regardless of alpha. We expect in reality that this probability will vary between people and will be unlikely to be zero. So test 12 and 18 trials with a probability of 0.5 and 0.2 as well.
12 trials; probability = 0.75
[1] "Simulated learning rates. 12 trials; probability = 0.75 (CSp acq contingency) \n"











12 trials; probability = 0.5
[1] "Simulated learning rates. 12 trials; Probability = 0.5\n"











12 trials; probability = 0.2
[1] "Simulated learning rates. 12 trials; Probability = 0.2\n"











18 trials; probability = 0.5
[1] "Simulated learning rates. 18 trials; Probability = 0.5\n"











18 trials; probability = 0.2
[1] "Simulated learning rates. 18 trials; Probability = 0.2\n"











Plot subset of trajectories in flare
package ‘data.table’ was built under R version 3.5.2data.table 1.12.0 Latest news: r-datatable.com
Attaching package: ‘reshape2’
The following objects are masked from ‘package:data.table’:
dcast, melt
package ‘dplyr’ was built under R version 3.5.2
Attaching package: ‘dplyr’
The following objects are masked from ‘package:data.table’:
between, first, last
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union




Try RStan
See if the basic punishment only learning model for the CS+ and CS- works with the FLARe master data
Run the 8schools check
From the rstan github
This is to check that all is compiling and working and to give and idea of data format etc.
Set up procedure to create and sync models.
This directs to my local machine here /Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/Hierachal_modelling/Scripts and is remotely linked to the github repository here.
Make sure the most up to date stan file is in the remote repo
Analyses
Function block
test function
A function for running minimal, medium or maximal tests of the stan data. This changes how many chains and iterations are run
testing <- function(x) {
if (x %in% c('min',"Min")) {
chain_iter<<-400
warm_up<<-100
chain_n<<-1
} else if ( x %in% c('med' ,'Med')) {
chain_iter<<-1000
warm_up<<-500
chain_n<<-1
} else if ( x %in% c('max','Max')) {
chain_iter<<-2000
warm_up<<-1000
chain_n<<-2
} else if ( x %in% c('full' ,'Full')) {
chain_iter<<-4000
warm_up<<-1000
chain_n<<-4
} else if ( x %in% c('skip' ,'Skip')) {
chain_iter<<-0
warm_up<<-0
chain_n<<-0
}
}
Model run, load or skip
A function for either running the model, loading in the data if it already exists and doesnt need redoing, or skipping block entirely.
x is testing command, skip or load commans y is stan script z is flare_data set to use.
Note that this needs to have scriptdir and datadir existing in the workplace
model_run <- function(x,y,z) {
if (x %in% c('skip',"Skip")) {
stop("skipping this model")
}
if (x %in% c('min',"med","max","Min","Med","Max","full","Full")) {
print("running model")
testing(x)
stanname = y
stanfile <- file.path(scriptdir, stanname)
flare_data <- z
# note that flare_data is set up elsewhere (see block below)
flare_fit <- stan(file = stanfile, data = flare_data, iter=chain_iter, chains = chain_n) #add working dir?
save_name <- gsub(".stan",".rds", stanname)
saveRDS(flare_fit, file=file.path(datadir,save_name))
print(traceplot(flare_fit,'lp__'))
# extract fit data
return(summary(flare_fit))
}
if (x %in% c('load',"Load")) {
print("Loading existing model fit data")
stanname = gsub(".stan",".rds", y)
fitfile <- readRDS(file=paste0(datadir,stanname))
print(traceplot(fitfile,'lp__'))
return(summary(fitfile))
}
}
out describe
Function for describing the mean etc of freely estimated parameters from STAN output
out_describe<- function(summary,n,all = NULL){
library(dplyr)
print(paste0(chain_iter, " iterations ", ' on ', chain_n,' chains.',sep=" "))
print(paste("Estimated",(dim(summary$summary)[1]-1) / nsub,"Free paramaters per person",sep=" "))
summary <- data.frame(summary$summary[(1:(dim(summary$summary)-1)[1]),])
table <- summary %>%
mutate(parameter = rep(1:(dim(summary)[1]/n),each = n )) %>%
group_by(parameter) %>%
summarize(mean = mean(mean,na.rm=T),
se_mean = mean(se_mean,na.rm=),
sd = mean(sd,na.rm=T),
Rhat = mean(Rhat,na.rm=T))
param_names <- row.names(summary)[(seq(1,dim(summary)[1],n))]
table$parameter <- param_names
if (is.null(all) & dim(table)[1] > 10) {
print("This table is very large. Returning only the top 6 entries unless you have set the 3rd function option to 'all'. ")
return(head(table))
} else {
return(table)
}
}
BIC
Canonincal BIC function from log likelihood courtesy of Alex Pike
## canonical BIC function (Alex Pikes)
bic<-function(trials,neg_log_like,nparam) {
if (sum(neg_log_like<0)>0){print('check this is negative log likelihood!!')}
2*neg_log_like+nparam*log(trials) #canonical
}
model plot
a function for plotting BIC barchart for different models contained in a dataset
## model compare plot function
plot_models <- function(dataset) {
dataset$BIC <- odp(as.numeric(dataset$BIC))
dataset <- as.data.frame(na.omit(dataset))
yminv <- min(dataset$BIC,na.rm=T) -5
ymaxv <- max(dataset$BIC,na.rm=T) +5
plot <- ggplot(dataset,aes(x=reorder(model,BIC),y=BIC)) +
geom_bar(stat="identity") +
coord_flip() +
labs(title = "Model comparison",
y="Bayesian Information Criterion (BIC)") +
scale_y_continuous(limits=c(yminv,ymaxv),oob=rescale_none)
show(plot)
}
ODP
the below is a function that will format your numbers to one decimal place using sprintf
odp <- function(x) {
as.numeric(sprintf("%2.2f",x))
}
Create datasets
notes
We need to rescale our dataset here to be between 0 and 1.
Importantly, because we are using the proportion of trials that are not reinforced as a known parameter for statistical reasons (we don’t want a proportion of .75 and 1, better to have .25 and 0), we have made our rescaled expectancy values as 1 - rescaled(x). This means that we will still be able to interpret the results in the expected way (i.e. higher rating is greater expectation of the outcome).
Expectancy data
load in the week 1 app and lab data for FLARe pilot, TRT and headphones studies. Make it long form.
Try with acquisition data first. This is formatted with no column names, with no missing data.
Derivethe n parameter for both files and check these match
set up trial number
# create the n trials variable for RStan
ntrials=12
stanname='punish_only.stan'
minus_name <- 'bayes_acq_minus.csv'
plus_name <- "bayes_acq_plus.csv"
stanfile <- file.path(scriptdir, stanname)
minusfile <- file.path(datadir,minus_name)
plusfile <- file.path(datadir,plus_name)
minus <- fread(minusfile,data.table=F)
plus <-fread(plusfile,data.table=F)
nacqm <- dim(minus)[1]
nacqp <- dim(plus)[1]
## check that these match and create nsub variable for RStan
if (nacqm == nacqp) {
print('subject number match')
nsub <- nacqm
print(paste('nsub set to',nsub,sep=" "))
} else {
print('WARNING: subject number does not match. Check master dataset')
}
[1] "subject number match"
[1] "nsub set to 335"
# check the file format is ok
minus[1:2,]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 5 4 3 1 2 2 1 2 3 2 3 2
2 8 8 1 5 4 3 2 1 1 1 1 1
plus[1:2,]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 5 6 7 4 7 7 6 7 2 8 7 8
2 1 9 9 5 8 8 9 9 9 8 9 8
The expectancy rating datasets look like they are formatted fine and ntrials and nsub variables should exist.
make rating data binary
for now to see if stan runs using bernoulli-logit function make binary resposnes from expectancy i.e. >=4.5 ==1, <= 4.5 ==0.
binarise <- function(x) {
ifelse(x >= 4.5,1,0)
}
minusb <- data.frame(apply(minus,2,function(x) binarise(x)))
plusb <- data.frame(apply(plus,2,function(x) binarise(x)))
Proportion screams data
This is a vector containing the absolute number of trials where no scream occurred for each stimulus. As there was a 75% reinforcement rate for the CS+ (9/12 trials), this is a vector of ’3’s. For the CS-, no trials were reinforced so is a vector of ’12’s
No_scream_p <- rep(3,nsub)
No_scream_m <- rep(12,nsub)
Scream per trial data
Create datasets for the acquisition CS- and extinction CS+ and CS- reflecting that no screams occurred at all. Then use the pattern id variable to create a dataset for the acquisition CS+ indicating when a scream occurred for each participant.
## Create the no scream daatsets for all
screamMinus <- matrix(0L,nrow=nsub, ncol=ntrials)
library(data.table)
## read in the screams for acquisition
screamPlus <- fread("/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/LatentGrowth/Datasets/bayes_screams_acq.csv",data.table=F)
#
# ### for the time being, simulate the NA data for the studies where I havent yet finished cleaning the screams.
#
# # make the first trial 1 for everyone, then add 8 additional random 1's per person. Do this in four random patterns to mimic the real data
#
# sc1 <- c(1,1,0,1,0,0,1,1,1,1,1)
# sc2 <- c(0,1,1,1,0,0,1,1,1,1,1)
# sc3 <- c(1,1,1,0,1,0,1,0,1,1,1)
# sc4 <- c(1,0,1,1,0,0,1,1,1,1,1)
#
#
# screamPlus[,1] <- ifelse(is.na(screamPlus[,1]),1,screamPlus[,1])
#
# # for (n in 1:dim(screamPlus)[1]) {
# # print(n)
# # screamPlus[n,2:12] <- sample(patts,1,replace=T)
# # }
#
# for (n in 1:dim(screamPlus)[1]) {
#
# a <- sample(c(1,4),1)
#
# if (is.na(screamPlus[n,2])){
# if (a == 1) {
# screamPlus[n,2:12] <- sc1
# } else if (a == 2) {
# screamPlus[n,2:12] <- sc2
# } else if (a == 3){
# screamPlus[n,2:12] <- sc3
# } else {
# screamPlus[n,2:12] <- sc4
# }
# }
# }
Create dataset for barplot comparing output
library(ggplot2)
mod_comp <- data.frame(model=NA,BIC=NA)
rescale data
rescale the 1-9 expectancy values to be on a 0-1 scale.
stan cannot deal with the extreme limit of the beta, so make the rescaled limits just above 0 and below one
Note that when a value had to be imputed as it was missing it will not be an integer. Thus the function needs to allow for ranges between values.
library(scales)
Attaching package: ‘scales’
The following object is masked from ‘package:purrr’:
discard
The following object is masked from ‘package:readr’:
col_factor
The following objects are masked from ‘package:psych’:
alpha, rescale
# rescale and flip so that we are effectively rating the expectation that they WILL NOT hear a scream to match stan
## rescaling such that the distribution spaces the numbers 1-9 evenly. the first interval upper bound would be 0.11, then 0.22 etc. this means that the mid point of each itnerval will be:
print("mid point of each evenly spaced interval representing values between 1-9")
[1] "mid point of each evenly spaced interval representing values between 1-9"
seq(0.5/9,1,1/9)
[1] 0.05555556 0.16666667 0.27777778 0.38888889 0.50000000 0.61111111 0.72222222 0.83333333 0.94444444
## thus 1 will be 1-0.055 etc.
## NOTE: might want to consider making this more flexible. enter in the numer of choice options as a variable - would be very easy. add to function library at later stage
scale_flare <- function(x){
vals <- seq(0.5/9,1,1/9)
for (val in 1:9){
if (x > val-1 & x <= val){
x <- 1 - vals[val]
}
}
return(x)
}
## initialise minus_scaled dataframe.
minus_scaled <- data.frame(matrix(ncol=dim(minus)[2],nrow = dim(minus)[1]))
## populate with rexcaled values
for (sub in 1:dim(minus)[1]){
for (col in 1:dim(minus)[2]){
minus_scaled[sub,col] <- scale_flare(minus[sub,col])
}
}
## ditto for plus
plus_scaled <- data.frame(matrix(ncol=dim(minus)[2],nrow = dim(minus)[1]))
for (sub in 1:dim(plus)[1]){
for (col in 1:dim(plus)[2]){
plus_scaled[sub,col] <- scale_flare(plus[sub,col])
}
}
## this is the number that will take from the midpoint to the top and bottom for the new boundaries (with ratings representing the midpoint)
cdf_scale <- 1/18
Set up stan
These use Alex Pikes RStan script with minor modification to make it punishment only to see if it runs. Testing that the approach works with the current data set up etc.
The settings for the script are below, including stan chain parameters and directory set up.
This loads the libraries and source files needed to run this script, and sets up RStan
Stan data
## Test data (Pilot + TRT + Validation) proportion no screams
#data
data_files<-list(ntrials=ntrials,nsub=nsub,nothingPlus = No_scream_p, nothingMinus=No_scream_m,ratingsPlus=plus_scaled,ratingsMinus=minus_scaled)
## Test data (Pilot + TRT + Validation) proportion screams, no log likelihood
flare_data_nolog <-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(plus_scaled),ratingsMinus=t(minus_scaled))
## Test data (Pilot + TRT + Validation) proportion screams, no log likelihood
flare_data<-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus=t(screamMinus),ratingsPlus=t(plus_scaled),ratingsMinus=t(minus_scaled),cdf_scale=cdf_scale)
## Validation data (TEDS) scaled
#
# TEDS_data<-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(plus_scaled),ratingsMinus=t(minus_scaled))
Baseline models
Model 1: single beta no scaling
notes
Because we use the 1-rescaled expectancy data, no need to try and invert to reinforcement parameters here. As a result we need the stan model to simply be:
alphaPlus[p] = nothingPlus[p]/ntrials;
alphaMinus[p] = nothingMinus[p]/ntrials;
run Alex Pike’s stan script for non scaled beta model.
here we try to estimate the alpha parameter of the beta distribution per trial per person per stimulus. (i.e. you have two sufficient parameters for each beta dist, the alpha and beta. we want to estimate the alpha - ).
Eventually we will scale these by the actual ‘value’ of the scream for each person per trial.
Using data loaded in from preliminary tests above.
so this is a beta value per person (assuming the underlying process for the plus and minus are the same)
Model
#script
stanname='beta_noscaling.stan'
#data
data_files<-list(ntrials=ntrials,nsub=nsub,nothingPlus = No_scream_p, nothingMinus=No_scream_m,ratingsPlus=plus_scaled,ratingsMinus=minus_scaled)
flare_fit <- model_run('load','beta_noscaling.stan',data_files)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
Model 2: single beta scaled
notes
Simple alteration of the first model. We estimate a scaling parameter per person over all trials and apply this to alpha component per participant.
run Alex Pike’s stan script for scaled beta model.
here we try to estimate the alpha parameter of the beta distribution per trial per person per stimulus. (i.e. you have two sufficient parameters for each beta dist, the alpha and beta. we want to estimate the alpha - ).
Eventually we will scale these by the actual ‘value’ of the scream for each person per trial.
Using data loaded in from preliminary tests above.
so this is a beta value per person (assuming the underlying process for the plus and minus are the same)
Model
flare_fit <- model_run('max',stanname,data_files)
[1] "running model"
starting worker pid=49421 on localhost:11817 at 12:36:45.564
starting worker pid=49431 on localhost:11817 at 12:36:45.790
SAMPLING FOR MODEL 'beta_scaling' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.001189 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 11.89 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
SAMPLING FOR MODEL 'beta_scaling' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0.002262 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 22.62 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 122.944 seconds (Warm-up)
Chain 1: 43.873 seconds (Sampling)
Chain 1: 166.817 seconds (Total)
Chain 1:
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 117.623 seconds (Warm-up)
Chain 2: 83.1269 seconds (Sampling)
Chain 2: 200.749 seconds (Total)
Chain 2:
Warning message:
package ‘StanHeaders’ was built under R version 3.5.2
Warning message:
package ‘StanHeaders’ was built under R version 3.5.2

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
[1] "2000 iterations on 2 chains. "
[1] "Estimated 2 Free paramaters per person"
# extract fit data
summary_flare <- summary(flare_fit)
run Alex Pike’s stan script for scaled beta model.
here we try to estimate the alpha parameter of the beta distribution per trial per person per stimulus. (i.e. you have two sufficient parameters for each beta dist, the alpha and beta. we want to estimate the alpha - ).
Eventually we will scale these by the actual ‘value’ of the scream for each person per trial.
Using data loaded in from preliminary tests above.
so this is a beta value per person (assuming the underlying process for the plus and minus are the same)
#script
stanname='beta_withRL.stan'
flare_fit <- model_run('med',stanname,flare_data)
[1] "running model"
SAMPLING FOR MODEL 'beta_withRL' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.006291 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 62.91 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 92.0107 seconds (Warm-up)
Chain 1: 62.1928 seconds (Sampling)
Chain 1: 154.204 seconds (Total)
Chain 1:

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
[1] "1000 iterations on 1 chains. "
[1] "Estimated 2 Free paramaters per person"
# extract fit data
summary_flare <- summary(flare_fit)
Model 3: RL, mean defined, single beta
notes
this model includes an alpha learning paramater per person estimating their learning rate and updating based on it. This model needs a dataset that indicates whether a scream occurred for each trial instead of the proportion of times no scream occurred.
Mean to define shape
this model includes an alpha learning paramater per person estimating their learning rate and updating based on it. This model needs a dataset that indicates whether a scream occurred for each trial instead of the proportion of times no scream occurred.
Alex used this stack post to help solve the shape parameters using mean and sd where we assume that v serves as the mean and beta as the sd.
the equations work out to this:
for shape 1:
\[\alpha = \left(\frac{1-\mu}{\sigma^2} - \frac{1}{\mu}\right)\mu^2\]
for shape 2:
\[\beta=\alpha \left(\frac{1}{\mu}-1\right)\] ### Model
this way of defining the mean does not work or even run, so skipping it.
#script
stanname='beta_meansd_RL.stan'
flare_fit <- model_run('skip',stanname,flare_data_nolog)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
On 500 iterations (i.e. test) the variance in alpha is good, but the traceplot is terrible. Model coverges very poorly. We also have to constrain the beta to be betwqeen 0 and 0.0001. Not sure why this is.
when running for 2000 iterations (1000 warmup)…
This results in the following warning;
There were 2644 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmupThere were 4 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceededThere were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems
Mean definition 2
The above mean definition does not map the data well (terrible traceplot!). I found this from the MRC BSU and have tried defining the beta parameters assuming V == mean in a slighty different way:
for paramater a:
\[\alpha = \mu\beta/(1-\mu)\]
for parameter b:
\[\beta = \mu(1-\mu)^2/\sigma+\mu-1\]
Model
Still using a single beta here.
skipping as it also does not run
#script
stanname='beta_meansd_RL_2.stan'
flare_fit <- model_run('skip',stanname,flare_data_nolog)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
Mean definition 3
noted that the shape parameters have slight variations in definition according to discussion here. Updated the script slightly to reflect this based on the reply from ocram.
the first sd term in shape a is changed to variance, so it changes from:
\[\alpha = \left(\frac{1-\mu}{\sigma^2} - \frac{1}{\mu}\right)\mu^2\]
to
\[\alpha = \left(\frac{1-\mu}{\sigma} - \frac{1}{\mu}\right)\mu^2\]
Changes the shape 2 parameter definition from:
\[\beta=\alpha \left(\frac{1}{\mu}-1\right)\]
to
\[\beta = \left(\frac{1-\mu}{\sigma} - \frac{1}{\mu}\right)\mu\left(1-\mu\right)\]
Because this works best, will add loglikelihhod calculation here. Basing this on the probability density function for the beta distribution given the participants actual ratings and sufficient parameters of the distribution per trial.
loglik[p] = loglik[p] + beta_lpdf(ratingsPlus[t,p]|shape1_Plus[t,p],shape2_Plus[t,p]) + beta_lpdf(ratingsMinus[t,p]|shape1_Minus[t,p],shape2_Minus[t,p])
Model
Skip as this definition also does not work or run
#script
stanname='beta_meansd_RL_3.stan'
flare_fit <- model_run('skip',stanname,flare_data_nolog)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
This model is substnatially better than either of the other two. Traceplot suggests that the iterations converge as we would like. However, we still need to massively constrain the beta (i.e. confidence / uncertainty) estimates for it to run, otherwise the starting values drop below zero.
Mean definition 4
Here I try to define the parameter using simplified mean and precision estimates as per this tutorial. See in particular the parameter estimation on the cubs data.
This results in a relatively simplified parameter estimation compared to model 3.
\[\alpha = \mu * ((\mu * (1-\mu)) / \sigma - 1)\]
where mu is the mean (or value) and sigma is the variance / uncertainty parameter we currently call beta.
and the b (or shape 2) parameter for the distribution is:
\[\beta = (1- \mu) * ((\mu * (1-\mu)) / \sigma - 1)\]
Model
#script
stanname='beta_meansd_RL_4.stan'
flare_fit <- model_run('full',stanname,flare_data)
[1] "running model"
starting worker pid=49897 on localhost:11817 at 12:44:22.799
starting worker pid=49907 on localhost:11817 at 12:44:23.045
starting worker pid=49917 on localhost:11817 at 12:44:23.284
starting worker pid=49927 on localhost:11817 at 12:44:23.510
SAMPLING FOR MODEL 'beta_meansd_RL_4' NOW (CHAIN 1).
SAMPLING FOR MODEL 'beta_meansd_RL_4' NOW (CHAIN 2).
SAMPLING FOR MODEL 'beta_meansd_RL_4' NOW (CHAIN 3).
SAMPLING FOR MODEL 'beta_meansd_RL_4' NOW (CHAIN 4).
Chain 1:
Chain 1: Gradient evaluation took 0.015671 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 156.71 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 2:
Chain 2: Gradient evaluation took 0.014001 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 140.01 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 3:
Chain 3: Gradient evaluation took 0.013264 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 132.64 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 4:
Chain 4: Gradient evaluation took 0.015048 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 150.48 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 3: Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 4: Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 3: Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 4: Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 4: Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 2: Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 3: Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 1: Iteration: 800 / 4000 [ 20%] (Warmup)
Create BIC from log likelihood
## extract log likelihood
flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
#calculate BIC
FLARe_bic<-bic(ntrials,-colMeans(flare_loglike_best),2) #number of parameters in that model e.g. 4)
## mean BIC as model comparisons tool:
print("Mean Bayesian information criterion for model")
mean(FLARe_bic)
Add to bar plot
mod_comp <- rbind(mod_comp,c("Means 1 beta",as.numeric(mean(FLARe_bic))))
mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
mod_comp <- as.data.frame(na.omit(mod_comp))
## plot function - create plot
plot_models(mod_comp)
Model 4: RL, mode defined, single beta
notes
Used this post to guide this. particularly:
For a beta distribution with shape parameters a and b, the mode is (a-1)/(a+b-2). Suppose we have a desired mode, and we want to determine the corresponding shape parameters. Here’s the solution. First, we express the “certainty” of the estimate in terms of the equivalent prior sample size, k=a+b, with k≥2. The certainty must be at least 2 because it essentially assumes that the prior contains at least one “head” and one “tail,” which is to say that we know each outcome is at least possible. Then a little algebra reveals: a = mode * (k-2) + 1 b = (1-mode) * (k-2) + 1
shape 1 as mode with v and beta as beta shape parameters
For this version we try and estimate the ‘mode’ to be shape 1. KIRSTIN:: explain here
Model
doesnt work, so skip this first attempt
#script
stanname='beta_mode_RL.stan'
flare_fit <- model_run('skip',stanname,flare_data_nolog)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
v as mode
For this version we assume that V is the mode (above we assumed it serves as the mean) and beta is the certainty aspect (i.e. k)
What this does is basically treat the expected rating (value) as the a parameter for the distribution (scaled by their certainity - beta) and 1-that value as the b parameter (again, scaled by the uncertainty).
so you have a ratio of their selected value per trial (mode across iterations?) to how far from the highest possible choice they are.
Model
#script
stanname='beta_mode_RL_2.stan'
flare_fit <- model_run('med',stanname,flare_data)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
This works, but there is not a lot of variance in the alpha parameter when described by mode mean 0.49; sd = 0.06. Compared to defined by mean where mean is 0.54 and sd is 0.26.
However there is a lot of variation in the beta parameter (mean -7.21, sd = 134.74)
Create BIC from log likelihood
## extract log likelihood
flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
#calculate BIC
FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)
## mean BIC as model comparisons tool:
print("Mean Bayesian information criterion for model")
mean(FLARe_bic)
Add to bar plot
mod_comp <- rbind(mod_comp,c("Mode 1 beta",as.numeric(mean(FLARe_bic))))
mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
mod_comp <- as.data.frame(na.omit(mod_comp))
## plot function - create plot
plot_models(mod_comp)
Model 5: RL mean defined,two beta
Model
RL model adding a beta per stimulus to Alex’s model
#script
stanname='beta_meansd_2beta_RL.stan'
flare_fit <- model_run('full',stanname,flare_data)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
Create BIC from log likelihood
## extract log likelihood
flare_loglike_m2 <- extract_log_lik(flare_fit_m2, parameter_name = "loglik", merge_chains = TRUE)
#calculate BIC
FLARe_bic_m2 <- bic(ntrials,-colMeans(flare_loglike_m2),3) #number of parameters in that model e.g. 4)
# mean for all participants
mean(FLARe_bic_m2)
Add to bar plot
mod_comp <- rbind(mod_comp,c("Means 2 beta",mean(FLARe_bic_m2)))
mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
mod_comp <- as.data.frame(na.omit(mod_comp))
## plot function - create plot
plot_models(mod_comp)
Model 6: RL mode defined,two beta
Model
RL model adding a beta per stimuli to model defining the beta shape using the mode instead of the mean. This definitely makes more sense as we assume that they will have different levels of uncertainty about each.
#script
stanname='beta_mode_2beta_RL_2.stan'
flare_fit <- model_run('full',stanname,flare_data_nolog)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
The alpha parameter variance is normal (mean 0.4 and sd 0.12). Beta is much more bounded now though (combined across both stimuli mean 0.79, sd=1.6) over 4000 iterations on 4 chains.
Create BIC from log likelihood
## extract log likelihood
flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
#calculate BIC
FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)
## mean BIC as model comparisons tool:
print("Mean Bayesian information criterion for model")
mean(FLARe_bic)
Add to bar plot
mod_comp <- rbind(na.omit(mod_comp),c("Mode 2 beta",mean(FLARe_bic)))
mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
mod_comp <- as.data.frame(na.omit(mod_comp))
## plot function - create plot
plot_models(mod_comp)
Model 7: RL mean defined, no beta
Model
The beta doesnt work as well for the CS+ stimulus, need to check if this parameter adds anything to the model - drop it from our best mean model and see how this changes the fit.
this takes for ever to run and the logliklihood fails. So no idea if it is good yet - come back to this. skipped for now
#script
stanname='beta_meansd_RL_NoBeta.stan'
flare_fit <- model_run('skip',stanname,flare_data_nolog)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
Create BIC from log likelihood
#
# ## extract log likelihood
#
# flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
#
# #calculate BIC
#
# FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)
#
# ## mean BIC as model comparisons tool:
#
# print("Mean Bayesian information criterion for model")
# mean(FLARe_bic)
Add to bar plot
#
# mod_comp <- rbind(na.omit(mod_comp),c("Mean no beta",mean(FLARe_bic)))
#
# #
# mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
#
# mod_comp <- as.data.frame(na.omit(mod_comp))
#
## plot function - create plot
# plot_models(mod_comp)
Generate and recover
Here I test whether the model is working well by seeing if I can use the parameters we’ve estimated to try and generate our existing rating data and then recover similar parameters again.
I will do this for the best fitting model (mean defined beta distribution with a variance estimate per person for eahc stimulus) This is the model where we treat the iterarted ratings as if they are ‘expected’ values and use this as shape 1 parameter for our beta distribution at each trial. We have allowed a beta (or uncertainty) parameter per stimulus.
A good model will have a) a good correlation between real data and the data generated and b) a good correlation between the parameter estimates from the real and generated data.
We basically want to replicate our stan script, but inatead of estimating paratmers, we want to assume that we know what the parameters are (i.e. use the alpha and beta’s we have estimated previously).
update: turns out single beta is the best fitting model when I correct my BIC function to include the negative log likelihood. So will also generate and recover for this model and use this as the comparator.
Mean 1 beta
Generate
Make alpha / beta datasets p/p
Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.
params <- summary(flare_fit_best)
alpha_est <- data.frame(params$summary[1:nsub,1])
beta <- data.frame(params$summary[(nsub+1):(nsub*2),1])
names(beta) <- "beta"
Initialise empty datasets to hold the predicted ratings
rating_est_plus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
rating_est_minus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
# beta shape parameters
shape1p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape1m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
# V parameters (initialised at 0.5)
vp <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vm <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vp[1] <- 0.5
vm[1] <- 0.5
# prediction error
dp <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub))
dm <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub))
Simulate ratings
Use our extracted parameters in place of estimating the same. Use the stan syntax
Populate our vplus and delta frames
use the alpha parameters we’ve extracted (alpha_est) d == delta (precdiction error) v == value (i.e. value for each stimulus)
for (p in 1:nsub){
for (t in 1:(ntrials-1)){
dp[p,t] <- screamPlus[p,t]-vp[p,t]
dm[p,t] <- screamMinus[p,t]-vm[p,t]
vp[p,t+1] <- vp[p,t]+alpha_est[p,1]*dp[p,t]
vm[p,t+1]<- vm[p,t]+alpha_est[p,1]*dm[p,t]
}
}
for (t in 1:ntrials){
shape1_Plus[t,p] = VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
shape1_Minus[t,p] = VMinus[t,p] * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
shape2_Plus[t,p] = (1-VPlus[t,p]) * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
shape2_Minus[t,p] = (1-VMinus[t,p]) * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
ratingsPlus[t,p] ~ beta(shape1_Plus[t,p],shape2_Plus[t,p]);
ratingsMinus[t,p] ~ beta(shape1_Minus[t,p],shape2_Minus[t,p]);
}
} }
populate beta parameter shape frames
Use the new v frames and beta parameters.
Shape 1 and 2 are sufficient parameters for the beta distribution
for (p in 1:nsub){
for (t in 1:ntrials){
shape1p[p,t] = vp[p,t] * ((vp[p,t] * (1-vp[p,t])) / beta[p,1])
shape1m[p,t] = vm[p,t] * ((vm[p,t] * (1-vm[p,t])) / beta[p,1])
shape2p[p,t] = (1-vp[p,t]) * ((vp[p,t] * (1-vp[p,t])) / beta[p,1])
shape2m[p,t] = (1-vm[p,t]) * ((vm[p,t] * (1-vm[p,t])) / beta[p,1])
}
}
Estimate ratings
trying to use pbeta here (derives the distribution function givemn a set of probabilities)
For now, setting probabilities between 0 and 1 and taking the average…
for (p in 1:nsub){
for (t in 1:ntrials){
rating_est_plus[p,t] <- mean(rbeta(1000,shape1p[p,t],shape2p[p,t]))
rating_est_minus[p,t] <- mean(rbeta(1000,shape1m[p,t],shape2m[p,t]))
}
}
Rescale simulated ratings
You could argue that these should match the discrete scale nature of the original ratings. We effectively undid this in our script. The following will enable this.
HOWEVER: we are redcucing variance massively this way, so think it might be better to leave the recovered ratings unscales….
So - the following discrete values exist in our rescaled ratings:
table(plus_scaled$X1)
Will make it that anything that falls 0.05555556 above or below one of these values is set to this median point. Note that this is our cdf_scale factor that we used in the script to capture the full area under the curve for each segment of the distribution represented by the discrete ratings of 1-9.
Write the function to rescale
scale_simulated <- function(x){
scaled_list <- array(unique(plus_scaled$X1))
for (val in scaled_list[1:length(scaled_list)]){
if (x > val-cdf_scale & x < val+cdf_scale){
x <- val
}
}
return(x)
}
apply it to the simulated rating frames.
(unhash to run this)
## initialise dataframes
#
# est_plus_scaled <- data.frame(matrix(ncol=dim(rating_est_plus)[2],nrow = dim(rating_est_plus)[1]))
# est_minus_scaled <- data.frame(matrix(ncol=dim(rating_est_minus)[2],nrow = dim(rating_est_minus)[1]))
#
# ## populate with rescaled values
#
# for (sub in 1:dim(rating_est_plus)[1]){
# for (col in 1:dim(rating_est_plus)[2]){
#
# est_plus_scaled[sub,col] <- scale_simulated(rating_est_plus[sub,col])
# }
# }
#
# for (sub in 1:dim(rating_est_minus)[1]){
# for (col in 1:dim(rating_est_minus)[2]){
#
# est_minus_scaled[sub,col] <- scale_simulated(rating_est_minus[sub,col])
# }
# }
Correlate actual ratings with simulated ratings
use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings…
Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.
this will be using either rating_est files (rating_est_plus;rating_est_minus) or the est_scaled files (est_minus_scaled; est_plus_scaled) depending on whether we opt to return scaling or no
print("real ratings with estimated ratings: CS MINUS")
diag(corr.test(rating_est_minus,minus_scaled)$r)
print("real ratings with estimated ratings: CS MINUS (average for all trials)")
cor(rowMeans(rating_est_minus),rowMeans(minus_scaled))
print("real ratings with estimated ratings: CS PLUS")
diag(corr.test(rating_est_plus,plus_scaled)$r)
print("real ratings with estimated ratings: CS PLUS (average for all trials)")
cor(rowMeans(rating_est_plus),rowMeans(plus_scaled))
Recover
Here we are seeing if we can recover the same estimates using the simulated ratings. Basically run stan but using the estimated ratings instead of the real ones. See if we get the same alpha / beta parameters.
We might decide to use the rescaled estimates here to be more comparable…
run stan model
RL model adding a beta per stimulus to Alex’s model
#script
stanname='beta_meansd_RL_4.stan'
# data
flare_data_rec <-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(rating_est_plus),ratingsMinus=t(rating_est_minus),cdf_scale=cdf_scale)
flare_fit_rec <- model_run('full',stanname,flare_data_rec)
## get some basic output descriptions printed to screen
out_describe(flare_fit_rec,nsub)
Make alpha / beta datasets p/p
Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.
params_rec <- summary(flare_fit_rec)
alpha_est_rec <- data.frame(params_rec$summary[1:nsub,1])
beta_rec <- data.frame(params_rec$summary[(nsub+1):(nsub*2)])
names(beta_rec) <- "beta_rec"
Correlate actual ratings with simulated ratings
use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings…
Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.
print("original with recovered: ALPHA")
diag(corr.test(alpha_est_rec,alpha_est)$r)
print("original with recovered: BETA")
diag(corr.test(beta_rec,beta)$r)
Mean 2 beta
Generate
Make alpha / beta datasets p/p
Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.
params <- summary(flare_fit_m2)
alpha_est <- data.frame(params$summary[1:nsub,1])
beta_plus <- data.frame(matrix(ncol = 1,nrow=nsub))
beta_minus <- data.frame(matrix(ncol = 1,nrow=nsub))
names(beta_plus) <- "beta_plus"
names(beta_minus) <- "beta_minus"
subp = 0
subm = 0
for ( i in 343:1026){
if (i%%2 == 1){
subp= subp+1
beta_plus[subp,1] <- params$summary[i,1]
} else if (i%%2 == 0) {
subm= subm+1
beta_minus[subm,1] <- params$summary[i,1]
}
}
Initialise empty datasets to hold the predicted ratings
rating_est_plus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
rating_est_minus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
# beta shape parameters
shape1p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape1m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
# V parameters (initialised at 0.5)
vp <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vm <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vp[1] <- 0.5
vm[1] <- 0.5
# prediction error
dp <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub))
dm <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub))
Simulate ratings
Use our extracted parameters in place of estimating the same. Use the stan syntax
Populate our vplus and delta frames
use the alpha parameters we’ve extracted (alpha_est) d == delta (precdiction error) v == value (i.e. value for each stimulus)
for (p in 1:nsub){
for (t in 1:(ntrials-1)){
dp[p,t] <- screamPlus[p,t]-vp[p,t]
dm[p,t] <- screamMinus[p,t]-vm[p,t]
vp[p,t+1] <- vp[p,t]+alpha_est[p,1]*dp[p,t]
vm[p,t+1]<- vm[p,t]+alpha_est[p,1]*dm[p,t]
}
}
for (t in 1:ntrials){
shape1_Plus[t,p] = VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
shape1_Minus[t,p] = VMinus[t,p] * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
shape2_Plus[t,p] = (1-VPlus[t,p]) * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
shape2_Minus[t,p] = (1-VMinus[t,p]) * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
ratingsPlus[t,p] ~ beta(shape1_Plus[t,p],shape2_Plus[t,p]);
ratingsMinus[t,p] ~ beta(shape1_Minus[t,p],shape2_Minus[t,p]);
}
} }
populate beta parameter shape frames
Use the new v frames and beta parameters.
Shape 1 and 2 are sufficient parameters for the beta distribution
for (p in 1:nsub){
for (t in 1:ntrials){
shape1p[p,t] = vp[p,t] * ((vp[p,t] * (1-vp[p,t])) / beta_plus[p,1])
shape1m[p,t] = vm[p,t] * ((vm[p,t] * (1-vm[p,t])) / beta_minus[p,1])
shape2p[p,t] = (1-vp[p,t]) * ((vp[p,t] * (1-vp[p,t])) / beta_plus[p,1])
shape2m[p,t] = (1-vm[p,t]) * ((vm[p,t] * (1-vm[p,t])) / beta_minus[p,1])
}
}
Estimate ratings
trying to use pbeta here (derives the distribution function givemn a set of probabilities)
For now, setting probabilities between 0 and 1 and taking the average…
for (p in 1:nsub){
for (t in 1:ntrials){
rating_est_plus[p,t] <- mean(rbeta(1000,shape1p[p,t],shape2p[p,t]))
rating_est_minus[p,t] <- mean(rbeta(1000,shape1m[p,t],shape2m[p,t]))
}
}
Correlate actual ratings with simulated ratings
use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings…
Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.
print("real ratings with estimated ratings: CS MINUS")
diag(corr.test(rating_est_minus,minus_scaled)$r)
print("real ratings with estimated ratings: CS MINUS (average for all trials)")
cor(rowMeans(rating_est_minus),rowMeans(minus_scaled))
print("real ratings with estimated ratings: CS PLUS")
diag(corr.test(rating_est_plus,plus_scaled)$r)
print("real ratings with estimated ratings: CS PLUS (average for all trials)")
cor(rowMeans(rating_est_plus),rowMeans(plus_scaled))
Recover
Here we are seeing if we can recover the same estimates using the simulated ratings. Basically run stan but using the estimated ratings instead of the real ones. See if we get the same alpha / beta parameters.
rescale the estimated ratings
rescale the 1-9 expectancy values to be on a 0-1 scale.
stan cannot deal with the extreme limit of the beta, so make the rescaled limits just above 0 and below one
Note that when a value had to be imputed as it was missing it will not be an integer. Thus the function needs to allow for ranges between values.
#
# minus_scaled_est <- data.frame(matrix(ncol=dim(rating_est_minus)[2],nrow = dim(rating_est_minus)[1]))
#
# ## populate with rexcaled values
#
# for (sub in 1:dim(rating_est_minus)[1]){
# for (col in 1:dim(rating_est_minus)[2]){
#
# minus_scaled_est[sub,col] <- scale_flare(rating_est_minus[sub,col])
# }
# }
#
# ## ditto for plus
#
# plus_scaled_est <- data.frame(matrix(ncol=dim(rating_est_plus)[2],nrow = dim(rating_est_plus)[1]))
#
# ## populate with rexcaled values
#
# for (sub in 1:dim(rating_est_plus)[1]){
# for (col in 1:dim(rating_est_plus)[2]){
#
# plus_scaled_est[sub,col] <- scale_flare(rating_est_plus[sub,col])
# }
# }
#
run stan model
RL model adding a beta per stimulus to Alex’s model
#script
stanname='beta_meansd_2beta_RL.stan'
# data
flare_data_rec <-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(rating_est_plus),ratingsMinus=t(rating_est_minus),cdf_scale=cdf_scale)
flare_fit_rec <- model_run('full',stanname,flare_data_rec)
## get some basic output descriptions printed to screen
out_describe(flare_fit_rec,nsub)
Make alpha / beta datasets p/p
Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.
params_rec <- summary(flare_fit_rec)
alpha_est_rec <- data.frame(params_rec$summary[1:nsub,1])
beta_plus_rec <- data.frame(matrix(ncol = 1,nrow=nsub))
beta_minus_rec <- data.frame(matrix(ncol = 1,nrow=nsub))
names(beta_plus_rec) <- "beta_plus"
names(beta_minus_rec) <- "beta_minus"
subp = 0
subm = 0
for ( i in 343:1026){
if (i%%2 == 1){
subp= subp+1
beta_plus_rec[subp,1] <- params_rec$summary[i,1]
} else if (i%%2 == 0) {
subm= subm+1
beta_minus_rec[subm,1] <- params_rec$summary[i,1]
}
}
Correlate actual ratings with simulated ratings
use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings…
Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.
print("original with recovered: ALPHA")
diag(corr.test(alpha_est_rec,alpha_est)$r)
print("original with recovered: BETA PLUS")
diag(corr.test(beta_plus_rec,beta_plus)$r)
print("original with recovered: BETA MINUS")
diag(corr.test(beta_minus_rec,beta_minus)$r)
Expanding on the best base model
Potentially interesting parameters to add to best fit model
Model 8: Punishment sensitivity
How aversive they find the scream reinforcement. Modelling this on the loss aversion parameter in Charpentier et al (see the last page before references),
This will be a single parameter per person, and represents how much the scream influences their ratings.
Based on the paper, will try the following to model this in stan by including it in our value calcs for the CS+ and CS- respectively. we will do this by letting it influence how much their prediction error changes based on whether a scream occurred or not. The prediction error is later used to change the value rating per stimulus
\[d(stimulus,trial) = scream*\lambda-v(stimulus,trial-1)\]
where \[\lambda = sensitivity\\to\\screams\]
Model
#script
stanname='beta_mean1beta_PunSens.stan'
flare_fit <- model_run('full',stanname,flare_data)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
Create BIC from log likelihood
## extract log likelihood
flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
#calculate BIC
FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)
## mean BIC as model comparisons tool:
print("Mean Bayesian information criterion for model")
mean(FLARe_bic)
Add to bar plot
mod_comp <- rbind(na.omit(mod_comp),c("Punishment sensitivity",mean(FLARe_bic)))
mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
mod_comp <- as.data.frame(na.omit(mod_comp))
## plot function - create plot
plot_models(mod_comp)
Generate
Make alpha / beta datasets p/p
Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.
params <- summary(flare_fit)
alpha_est <- data.frame(params$summary[1:nsub,1])
beta <- data.frame(params$summary[(nsub+1):(nsub*2),1])
lambda <- data.frame(params$summary[(nsub*3+1):(nsub*4),1])
names(alpha_est) <- "alpha"
names(beta) <- "beta"
names(lambda) <- "lambda"
Initialise empty datasets to hold the predicted ratings
rating_est_plus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
rating_est_minus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
# beta shape parameters
shape1p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape1m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
# V parameters (initialised at random value between 0.5 - 0.05 and 0.5+0.05)
vp <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vm <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vp[1] <- rnorm(nsub,0.5,0.025)
vm[1] <- rnorm(nsub,0.5,0.025)
# prediction error
dp <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub))
dm <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub))
Simulate ratings
Use our extracted parameters in place of estimating the same. Use the stan syntax
Populate our vplus and delta frames
use the alpha parameters we’ve extracted (alpha_est) d == delta (precdiction error) v == value (i.e. value for each stimulus)
for (p in 1:nsub){
for (t in 1:(ntrials-1)){
dp[p,t] <- screamPlus[p,t]*lambda[p,]-vp[p,t]
dm[p,t] <- screamMinus[p,t]*lambda[p,]-vm[p,t]
vp[p,t+1] <- vp[p,t]+alpha_est[p,1]*dp[p,t]
vm[p,t+1]<- vm[p,t]+alpha_est[p,1]*dm[p,t]
}
}
for (t in 1:ntrials){
shape1_Plus[t,p] = VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
shape1_Minus[t,p] = VMinus[t,p] * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
shape2_Plus[t,p] = (1-VPlus[t,p]) * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
shape2_Minus[t,p] = (1-VMinus[t,p]) * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
ratingsPlus[t,p] ~ beta(shape1_Plus[t,p],shape2_Plus[t,p]);
ratingsMinus[t,p] ~ beta(shape1_Minus[t,p],shape2_Minus[t,p]);
}
} }
populate beta parameter shape frames
Use the new v frames and beta parameters.
Shape 1 and 2 are sufficient parameters for the beta distribution
for (p in 1:nsub){
for (t in 1:ntrials){
shape1p[p,t] = vp[p,t] * ((vp[p,t] * (1-vp[p,t])) / beta[p,1])
shape1m[p,t] = vm[p,t] * ((vm[p,t] * (1-vm[p,t])) / beta[p,1])
shape2p[p,t] = (1-vp[p,t]) * ((vp[p,t] * (1-vp[p,t])) / beta[p,1])
shape2m[p,t] = (1-vm[p,t]) * ((vm[p,t] * (1-vm[p,t])) / beta[p,1])
}
}
Estimate ratings
trying to use pbeta here (derives the distribution function givemn a set of probabilities)
For now, setting probabilities between 0 and 1 and taking the average…
for (p in 1:nsub){
for (t in 1:ntrials){
rating_est_plus[p,t] <- mean(rbeta(1000,shape1p[p,t],shape2p[p,t]))
rating_est_minus[p,t] <- mean(rbeta(1000,shape1m[p,t],shape2m[p,t]))
}
}
Correlate actual ratings with simulated ratings
use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings…
Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.
print("real ratings with estimated ratings: CS MINUS")
diag(corr.test(rating_est_minus,minus_scaled)$r)
print("real ratings with estimated ratings: CS MINUS (average for all trials)")
cor(rowMeans(rating_est_minus),rowMeans(minus_scaled))
print("real ratings with estimated ratings: CS PLUS")
diag(corr.test(rating_est_plus,plus_scaled)$r)
print("real ratings with estimated ratings: CS PLUS (average for all trials)")
cor(rowMeans(rating_est_plus),rowMeans(plus_scaled))
Recover
Here we are seeing if we can recover the same estimates using the simulated ratings. Basically run stan but using the estimated ratings instead of the real ones. See if we get the same alpha / beta parameters.
run stan model
#script
stanname='beta_mean1beta_PunSens.stan'
# data
flare_data_rec<-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(rating_est_plus),ratingsMinus=t(rating_est_minus),cdf_scale=cdf_scale)
flare_fit_rec <- model_run('full',stanname,flare_data_rec)
## get some basic output descriptions printed to screen
out_describe(flare_fit_rec,nsub)
Make alpha / beta datasets p/p
Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.
params_rec <- summary(flare_fit_rec)
alpha_est_rec <- data.frame(params_rec$summary[1:nsub,1])
beta_rec <- data.frame(params_rec$summary[(nsub+1):(nsub*2),1])
lambda_rec <- data.frame(params_rec$summary[(nsub*3+1):(nsub*4),1])
Correlate actual ratings with simulated ratings
use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings…
Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.
print("original with recovered: ALPHA")
diag(corr.test(alpha_est_rec,alpha_est)$r)
print("original with recovered: BETA")
diag(corr.test(beta_rec,beta)$r)
print("original with recovered: LAMBDA")
diag(corr.test(lambda_rec,lambda)$r)
Beta is very poorly recovered here. Alpha and Lambda are recovered exceptionally well.
Will try a quick 2 beta mdoel with punishment sensitivity to see if this improves things.
Model 9: Punishment sensitivity 2 beta
Model
#script
stanname='beta_mean1beta_PunSens2Beta.stan'
flare_fit <- model_run('full',stanname,flare_data)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
Create BIC from log likelihood
## extract log likelihood
flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
#calculate BIC
FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)
## mean BIC as model comparisons tool:
print("Mean Bayesian information criterion for model")
mean(FLARe_bic)
Add to bar plot
mod_comp <- rbind(na.omit(mod_comp),c("Punishment sensitivity 2 beta",mean(FLARe_bic)))
mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
mod_comp <- as.data.frame(na.omit(mod_comp))
## plot function - create plot
plot_models(mod_comp)
Generate
Make alpha / beta datasets p/p
Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.
params <- summary(flare_fit)
alpha_est <- data.frame(params$summary[1:nsub,1])
beta_all <- data.frame(params$summary[(nsub+1):(nsub*3),1])
lambda <- data.frame(params$summary[(nsub*4+1):(nsub*5),1])
# divide beta into the two...
## p is 1, so the odd rows
beta_p <- data.frame(beta_all[ c(TRUE,FALSE), ]) # odd rows
beta_m <- data.frame(beta_all[ !c(TRUE,FALSE), ]) # even rows
names(alpha_est) <- "alpha"
names(beta_p) <- "beta_plus"
names(beta_m) <- "beta_minus"
names(lambda) <- "lambda"
Initialise empty datasets to hold the predicted ratings
rating_est_plus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
rating_est_minus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
# beta shape parameters
shape1p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape1m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
# V parameters (initialised at random value between 0.5 - 0.05 and 0.5+0.05)
vp <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vm <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vp[1] <- rnorm(nsub,0.5,0.025)
vm[1] <- rnorm(nsub,0.5,0.025)
# prediction error
dp <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub))
dm <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub))
Simulate ratings
Use our extracted parameters in place of estimating the same. Use the stan syntax
Populate our vplus and delta frames
use the alpha parameters we’ve extracted (alpha_est) d == delta (precdiction error) v == value (i.e. value for each stimulus)
for (p in 1:nsub){
for (t in 1:(ntrials-1)){
dp[p,t] <- screamPlus[p,t]*lambda[p,]-vp[p,t]
dm[p,t] <- screamMinus[p,t]*lambda[p,]-vm[p,t]
vp[p,t+1] <- vp[p,t]+alpha_est[p,1]*dp[p,t]
vm[p,t+1]<- vm[p,t]+alpha_est[p,1]*dm[p,t]
}
}
for (t in 1:ntrials){
shape1_Plus[t,p] = VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
shape1_Minus[t,p] = VMinus[t,p] * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
shape2_Plus[t,p] = (1-VPlus[t,p]) * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
shape2_Minus[t,p] = (1-VMinus[t,p]) * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
ratingsPlus[t,p] ~ beta(shape1_Plus[t,p],shape2_Plus[t,p]);
ratingsMinus[t,p] ~ beta(shape1_Minus[t,p],shape2_Minus[t,p]);
}
} }
populate beta parameter shape frames
Use the new v frames and beta parameters.
Shape 1 and 2 are sufficient parameters for the beta distribution
for (p in 1:nsub){
for (t in 1:ntrials){
shape1p[p,t] = vp[p,t] * ((vp[p,t] * (1-vp[p,t])) / beta_p[p,1])
shape2p[p,t] = (1-vp[p,t]) * ((vp[p,t] * (1-vp[p,t])) / beta_p[p,1])
shape1m[p,t] = vm[p,t] * ((vm[p,t] * (1-vm[p,t])) / beta_m[p,1])
shape2m[p,t] = (1-vm[p,t]) * ((vm[p,t] * (1-vm[p,t])) / beta_m[p,1])
}
}
Estimate ratings
trying to use pbeta here (derives the distribution function givemn a set of probabilities)
For now, setting probabilities between 0 and 1 and taking the average…
for (p in 1:nsub){
for (t in 1:ntrials){
rating_est_plus[p,t] <- mean(rbeta(1000,shape1p[p,t],shape2p[p,t]))
rating_est_minus[p,t] <- mean(rbeta(1000,shape1m[p,t],shape2m[p,t]))
}
}
Correlate actual ratings with simulated ratings
use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings…
Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.
print("real ratings with estimated ratings: CS MINUS")
diag(corr.test(rating_est_minus,minus_scaled)$r)
print("real ratings with estimated ratings: CS MINUS (average for all trials)")
cor(rowMeans(rating_est_minus),rowMeans(minus_scaled))
print("real ratings with estimated ratings: CS PLUS")
diag(corr.test(rating_est_plus,plus_scaled)$r)
print("real ratings with estimated ratings: CS PLUS (average for all trials)")
cor(rowMeans(rating_est_plus),rowMeans(plus_scaled))
The generating ratings are worse with this model overall.
Recover
Here we are seeing if we can recover the same estimates using the simulated ratings. Basically run stan but using the estimated ratings instead of the real ones. See if we get the same alpha / beta parameters.
run stan model
#script
stanname='beta_mean1beta_PunSens2Beta.stan'
# data
flare_data_rec<-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(rating_est_plus),ratingsMinus=t(rating_est_minus),cdf_scale=cdf_scale)
flare_fit_rec <- model_run('full',stanname,flare_data_rec)
## get some basic output descriptions printed to screen
out_describe(flare_fit_rec,nsub)
Make alpha / beta datasets p/p
Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.
params_rec <- summary(flare_fit_rec)
alpha_est_rec <- data.frame(params_rec$summary[1:nsub,1])
beta_all_rec <- data.frame(params_rec$summary[(nsub+1):(nsub*3),1])
lambda_rec <- data.frame(params_rec$summary[(nsub*4+1):(nsub*5),1])
# divide beta into the two...
## p is 1, so the odd rows
beta_p_rec <- data.frame(beta_all_rec[ c(TRUE,FALSE), ]) # odd rows
beta_m_rec <- data.frame(beta_all_rec[ !c(TRUE,FALSE), ]) # even rows
names(alpha_est_rec) <- "alpha"
names(beta_p_rec) <- "beta_plus"
names(beta_m_rec) <- "beta_minus"
names(lambda_rec) <- "lambda"
Correlate actual ratings with simulated ratings
use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings…
Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.
print("original with recovered: ALPHA")
diag(corr.test(alpha_est_rec,alpha_est)$r)
print("original with recovered: BETA PLUS")
diag(corr.test(beta_p_rec,beta_p)$r)
print("original with recovered: BETA MINUS")
diag(corr.test(beta_m_rec,beta_m)$r)
print("original with recovered: LAMBDA")
diag(corr.test(lambda_rec,lambda)$r)
Interestingly this is much worse for the lambda. Also not great for beta. Only alpha remains ok. So overall the 1 beta is probably better even though the BIC is slightly worse.
Dual learning
Model 10: dual learning + punishment sensitivity ( 1 beta)
a la Toby paper - dual learning model.
Allows the stimuli to update based on the other stimulus outcomes.
This version also includes the punishment sensitivity multiplier as this was best so far
Model
#script
stanname='beta_DualLearn_PunSens_1beta.stan'
flare_fit <- model_run('min',stanname,flare_data)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
Create BIC from log likelihood
## extract log likelihood
flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
#calculate BIC
FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)
## mean BIC as model comparisons tool:
print("Mean Bayesian information criterion for model")
mean(FLARe_bic)
Add to bar plot
mod_comp <- rbind(na.omit(mod_comp),c("Dual Learning",mean(FLARe_bic)))
mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
mod_comp <- as.data.frame(na.omit(mod_comp))
## plot function - create plot
plot_models(mod_comp)
Rating consistency
A parameter that represents the rating consistency for multiple repeated / similar trials. I think it would be best to have one each for the CS+ and CS- given these differ in terms of how similar the trials are (CS- is always un-enforced for example). Can imagine consistency is a parameter that is concistent regardless of reinforcement / stimulus type though, especially in later phases. So worth testing both models.
A similar parameter is used in the charpentier et al. paper (see the last page before the references).
We will estimate this parameter as a factor that influences the overall shape of the choice probability distribution (beta distributiuon). It will do this via the sufficient parameters that are influences by stimulus value etc per trial.
note very unsure about this - need to check it out with Alex
\[shape1(stimulus) = (1 + exp(-\mu * VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]))^-1 \] where \[\mu = logit\\sensitivty\]
Where logit sensitivity effectively means consistency of rating consistency; higher valued should mean greater consistency.
Model
#script
stanname='beta_mean1beta_Consistency.stan'
flare_fit <- model_run('min',stanname,flare_data)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
BIC is terrible
Create BIC from log likelihood
## extract log likelihood
flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
#calculate BIC
FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)
## mean BIC as model comparisons tool:
print("Mean Bayesian information criterion for model")
mean(FLARe_bic)
Add to bar plot
mod_comp <- rbind(na.omit(mod_comp),c("Consistency",mean(FLARe_bic)))
mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
mod_comp <- as.data.frame(na.omit(mod_comp))
## plot function - create plot
plot_models(mod_comp)
Generalisation
NOTE TO SELF::
maybe generalise the prediction error NOT the value (so add some od deltaP to the VMinus calculation….)
some intro
Basically here we want to capture a parameter that estimates how much the learning from the reinforced stimulus influences responses to the ‘safe’ stimulus.
Basing my first effort on Norbury, Robbins & Seymour, 2018, and their finding that there is generalisation based on value and perceptual processes.
From their abstract
“We found that generalization of avoidance could be parsed into perceptual and value-based processes, and further, that value-based generalization could be subdivided into that relating to aversive and neutral feedback”…. “Further, generalization from aversive, but not neutral, feedback was associated with self-reported anxiety and intrusive thoughts. These results reveal a set of distinct mechanisms that mediate generalization in avoidance learning, and show how specific individual differences within them can yield anxiety.”
from introduction
“It is therefore possible that under-generalization of safety cues, as opposed to over-generalization of aversive cues, might be a contributing factor to susceptibility to disorders such as generalized anxiety”
Equations
I will basically replicate the visual and value generalisation equations.
Visual (i.e. possible identify confusion over the two stimuli):
\[V_c,_m =0.80*V_C,_m + 0.20*V_C,_p\]
Value:
\[G_s = 1/exp(\rho_o)^2 / 2*\delta))\] where \(_s\) is current stimulus, and \(_o\) is the other stimulus. \(\rho\) is the parameter governing shape ‘spikiness’" \(\delta\) is a free parameter that governs the width of the Gaussian function that governs generalisation. This parameter should probably differ depending on trial outcome (scream or neutral) \(\delta_s\) and \(\delta_n\).
They authors update their value by multiplying it by the generalisation of the current ‘state’ (stimulus). i.e. \(*G_s\) as the last term.
First we will use a single \(\delta\) value, rather than updating per trial depending on if it is scream + or -
So, for ours the following will be added to the punishment sensitivity single beta model.
One generalisation parameter
Generalisation:
\[G = 1/exp((\rho_m - \rho_p)^2 / (2*\delta^2))\]
generalisation per stimulus
Generalisation plus:
\[G_p = 1/exp((\rho_m - \rho_p)^2 / (2*\delta^2))\]
Generalisation minus:
\[G_m = 1/exp((\rho_p - \rho_m)^2 / (2*\delta^2))\] Value plus:
\[VPlus[t+1,p]=VPlus[t,p]+alpha[p]*PredErrorPlus[t,p]*G_p\] Value minus:
\[VMinus[t+1,p]=VMinus[t,p]+alpha[p]*PredErrorMinus[t,p]*G_p\]
additional things to try:
dependence on prediction error
\(\kappa\) will be a free parameter that indicates difference in dependence on prediction error history, and will make the v updates like so:
Value plus:
\[VPlus[t+1,p]=VPlus[t,p]+\kappa*alpha[p]*PredErrorPlus[t,p]*G_p\] Value minus:
\[VMinus[t+1,p]=VMinus[t,p]+\kappa*alpha[p]*PredErrorMinus[t,p]*G_p\]
updating learning rate per trial, per stimulus
the alpha would become
\[\alpha_t+1 = \eta | (PredError-V_p,_t)| + (1-\eta)*\alpha_t\]
Model 11: Visual generalisation
Allow visual generalistion between to change the value of the stimuli per trial.
Multiply own value by .8 and add the other stimuli value of .2 (see equations above).
Model
#script
stanname='beta_Visual_Gen.stan'
flare_fit <- model_run('min',stanname,flare_data_nolog)
## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)
# extract fit data
summary_flare <- summary(flare_fit)
Create BIC from log likelihood
## extract log likelihood
flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
#calculate BIC
FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)
## mean BIC as model comparisons tool:
print("Mean Bayesian information criterion for model")
mean(FLARe_bic)
Add to bar plot
mod_comp <- rbind(na.omit(mod_comp),c("Visual Generalisation",mean(FLARe_bic)))
mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
mod_comp <- as.data.frame(na.omit(mod_comp))
## plot function - create plot
plot_models(mod_comp)
avoidance
Usinf volume per trial as an avoidance outcome and modelling as per Norbury et al in a sort of diffusion drift way. Basically updating value of avoid or not avoiding where avoid = = volume down per trial.
Might want to use a Pearce-Hall associatbility rule to update the learning rate here…
“According to this rule, the learning rate on each trial is determined by the absolute magnitude of past prediction errors, such that state-action value estimates are updated by more when previous outcomes have been more surprising, and by less when they were less surprising. This allows for learning in terms of modelled value adjustment to be greater when outcomes are more surprising (e.g. at the start of the task), but to be lesser (leading to more stable values) when outcomes are better predicted. A non-constant learning rate also ensures that parameters governing width of value-based generalization, which are assumed to be constant over the course of the task, are identifiable during parameter estimation (see below equations).”
Forgetting
this parameter is how much they retain what they learned over previous trials and use it to inform the current rating.
to do
investigte change point detection parameters (when reinforcement changes - i.e. moving acquisition to extinction) could do this or model the phases separately - check which best fits
add priors! These are what you expect the group to look like (i.e alpha is normally distributed around a mean of 0.5 with variance of 10 or something) LOOKUP R stan choice of priors. * can have informative or uninformative priors (i.e. agnostic or not)
Push any updates to github
Any push the updates to github
Unhash the series below if you made any changes.
## initialise bash directory and filename
stanname="punish_only.stan"
scriptdir="/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/Hierachal_modelling/Scripts"
## stage
#git add $scriptdir/$stanname
## push
#git push Bayes_modelling
---
title: "Hierachical computational modelling - FLARe"
author: "Kirstin Purves"
date: "1 April 2019"

output:
  html_document:
    df_print: paged
    toc: yes
    toc_depth: 2
    toc_float:
      collapsed: true
    number_sections: true
    highlight: monochrome
    theme: cerulean
    code_folding: show
     
  html_notebook:
    theme: cerulean
    toc: yes
   
---

# Introduction {.tabset}

Lab book for analyses using hierachal computational modelling to identify parameters that define the best model of learning as it applies to fear conditioning acquisition and extinction using FLARe fear conditioning data. 
Long abstract, justification and analysis plan found in prelim manuscript [here]([https://docs.google.com/document/d/1JhVCf0jlXFwXYQ2kjS3fpl7mYexDcULn7L1ZgJ6Nolw/edit?usp=sharing])

In short:

## Aims    
     
1.  Identify model of learning based on a priori hypotheses that best fits the trajectories of fear relevant learning in our FLARe dataset
      + Use all first week data from Validation, app TRT, lab TRT, Pilot, Headphones (n = 223 after exclusions)
      + Include Acquisition, extinction (trajectories representing fear learning and treatment)
      + Identify parameters that define these trajectories
          + e.g. Learnign rate, plateau, first ambiguous trial etc.
          
2.  Cross validate best fitting model in TEDS data

3.  Are these parameters associated with other emasures of indsividual differences in our datasets?
      + Personality (Neuroticism)
      + Current anxiety symptoms (GAD-7) - equivalent of baseline symptoms (Chris + Meg analyses)
      + Lifetime / trait anxiety (STAI / ASI - FLARe analyses)
      + Current depression symptoms (PHQ-9) -  equivalent of baseline symptoms (Chris + Meg analyses)
      + Interpretation biases (IUS, ASSIQ - FLARe analyses)
      + SES (Meg IAPT: benefits, employment etc) 
      + Gender (Meg analyses)
      + Emotion regulation profile (potentially LCA based?)


## Impact and relevance

```
Evidence from both human (Richter et al., 2012) and rodent (Galatzer-Levy, Bonanno, Bush, & LeDoux, 2013) studies suggest that trajectories of how we learn and extinguish fear differ between individuals. Different trajectories of fear and extinction have also been found using fear conditioning studies (e.g. Duits et al., 2016), a good model for the learning of, and treatment for, fear and anxiety disorders. It is likely that these trajectories of fear extinction might predict outcomes in exposure-based cognitive behavioural therapy (Kindt, 2014). 
 
Identifying parameters that predict individual trajectories of fear learning and extinction will enable us to harness fear conditioning data more effectively to aid in understanding mechanisms underlying the development of and treatment for anxiety disorders. With more accurate models of these processes, the potential to use fear conditioning paradigms to predict those most at risk of developing an anxiety disorder, and those who might respond best to exposure-based treatments, greatly improves.
```

## Useful references

[Sutton and Barto Reinforcement Learning](http://incompleteideas.net/book/RLbook2018.pdf) - Textbook on reinforcement learning   
[Anxiety promotes memory for mood-congruent faces but does not alter loss aversion (Charpentier...Robinson, 2015)](https://www.nature.com/articles/srep24746.pdf) - Good example of a sensitivity learning parameter    
[Hypotheses About the Relationship of Cognition With Psychopathology Should be Tested by Embedding Them Into Empirical Priors (Moutoussist et al., 2018)](https://www.frontiersin.org/articles/10.3389/fpsyg.2018.02504/full) - Including variables of interest (e.g. anxiety) in the model    


Toby Wise has just submitted an aversive learning paper incorporating beta probability distributions in the best model for uncertain learning parameters etc.

A copy of this is ![here](/Users/kirstin/Dropbox/SGDP/FLARe/PDFs/uncertainty_attention_paper_PLoSCB.pdf)


## Analysis plan 

1.  Define set of a priori models moving from simple to more complex  
      + Some parameters to include: 
        + Rate of learning (sometimes with punishment reinforcement)
        + Sensitivity to punishment
        + Pre-existing anxiety
        + SES? Gender?    
        

        
2.  Run each model and compare fit in FLARe pre TEDS data
      + Use Log likelihood and BIC etc.    
      
      
3.  Select best fitting model   


4.  Extract individual data for learning parameters from this model and see what factors best predict it
      + Anxiety (if anxiety isnt best as part of the model)
      + Interpretation biases
      + Tolerance of uncertanty
      + Cognitive emotional control
      + emotional attentional control 
      + SES?
      + Gender?    
      

4.  Run all models again in FLARe TEDS
      + Decide if the same model best fits the data again.
      + See if we get similar results from the parameter prediction   
      
    


Will use a combination of `R.Version(3.5.1)`, `RStan (Version 2.18.2, GitRev: 2e1f913d3ca3)` and `hBayesDM package in R (3.5.1)` [Ahn, W.-Y., Haines, N., & Zhang, L. (2017). Revealing neuro-computational mechanisms of reinforcement learning and decision-making with the hBayesDM package. Computational Psychiatry, 1, 24-57.](https://doi.org/10.1162/CPSY_a_00), which uses RStan


## Modelling notes {.tabset}

### Intuition


Discussion with Vince Valton and Alex Pike about the best way to fit this model. As the observed outcomes (expectancy ratings) are non binary and are related to eachother (i.e. as you become more likely to select 9, you become less likely to select 1) we should consider each trial for each person for each stimulus as a constantly updating beta distribution. so you might see a pattern like this for the CS+ in acq for example.

So, best model is likely to be one using beta distributions that show the probability distribution for each rating. 

We can use sufficient parameters to describe these (i.e. mean / sd or possibly the mode)

A useful intuition of the beta distribtion can be found [here](https://stats.stackexchange.com/questions/47771/what-is-the-intuition-behind-beta-distribution)

and a useful website [here](https://matthewdharris.com/2016/10/18/estimating-a-beta-distribution-with-stan-hmc/)

*scaling*

We can scale the beta by how aversive participants find the shock. i.e. it might update their learning as if there was .5 a shock or 1.5 of a shock depending on their own sensitivity to the aversiveness / punishment.

*alpha*

*generalisation*

We can do this with a single beta distribution for each phase (collapsing over the two stimuli). This would be akin to a per phase generalisation parameter in that it will be smaller if they tend to choose the same expectancy for both stimuli and larger if they tend to choose very differently for both stimuli. 

***However***, because these variables are not really equivalent (i.e the reinforcement rate is different for both, and we use this in the model)   

So instead we can create a paramater which is the value of cs- weighted by some value of the cs+. How much each individual weights by the Cs+ can be freely estimated by the model and can be the generalisation parameter.

So this would be vminus = vminus + (w)vplus (where the w parameter is the freely estimated parameter per person)


*per stimulus*
We probably want to model cs+ and cs- separately too - so have a beta distribution characterised by sufficient parameters for each.



*per trial*

All of the above can then also be done with updating per trial. 

*leaky beta*

we also need a model that incorporates 'leak'. **i.e.** learning leak - likely that participants will update more based on more recent trials and learn less from the more distant trials as time progresses. See Toby's paper ![here](/Users/kirstin/Dropbox/SGDP/FLARe/PDFs/uncertainty_attention_paper_PLoSCB.pdf) for more. 

*uncertainty*

We should consider incorportating a parameter that maps to participant uncertainty about outcomes. 

*anxiety*

Might be worth incorporating this as a model paramater / feature. Read this for more.

[Hypotheses About the Relationship of Cognition With Psychopathology Should be Tested by Embedding Them Into Empirical Priors (Moutoussist et al., 2018)](https://www.frontiersin.org/articles/10.3389/fpsyg.2018.02504/full)

### Log likelihood notes

As we are using a beta distribution, we will calculate log likelihood based on the probability function for the distribution (i.e. where will the peak of the shape be) given the participants response at each trial. So will add the probability density function given each trial response trial by trial for each of the CS+ and - summed together.

Will obtain 1 log likelihood and then 1 per trial and add together to make sure that these are comparable.


the basic stan terminology for this is below: 

>> beta_lpdf(rating[t,p]|shape1[t,p],shape2[t,p])

where beta_lpdf is the probability density given the rating made and each of the two beta distribution shape parameters that we estimate.

This is what we will use to compare models.

### Terminology

*V* == 'value'. Baasically a parameter that is about the salience of the stimulus at any given point.    
*alpha* == 'learning rate'. A parameter that describes how sensitive people are to updating their learning. So a fast learning rate means that learning on any given trial is weighted more based on the trials immediatly preceding than past ones, and a slow learning rate means that all past events influence learning more evenly.  Alex's tennis analogy is good here (Federer - stable player, can predict a win based on all matches; Murray - volatile player; his last match is best predictor of next match performance). 
*beta* == 'confidence'. This is sort of an error term - how much variance in rating choice is there for each person/trial. Can be thought of as the variance, or beta^2 as the sd.   

Can be confusing as we are using beta distributions (different thing) which has two sufficient parameters a + b).

### Beta distribution visualisation

and how they change depending on whether you change the beta or alpha parameters.

![A really nice summary visualisation](/Users/kirstin/Dropbox/Screenshots/beta_dist.png)

Here are some simulations I can change and play with the illustrate the same sort of thing. 


```{r beta simulations,echo=F}
x <- seq(0, 1, length = 21)
dbeta(x, 1, 1)
pbeta(x, 1, 1)

## Visualization, including limit cases:
pl.beta <- function(a,b, asp = if(isLim) 1, ylim = if(isLim) c(0,1.1)) {
  if(isLim <- a == 0 || b == 0 || a == Inf || b == Inf) {
    eps <- 1e-10
    x <- c(0, eps, (1:7)/16, 1/2+c(-eps,0,eps), (9:15)/16, 1-eps, 1)
  } else {
    x <- seq(0, 1, length = 1025)
  }
  fx <- cbind(dbeta(x, a,b), pbeta(x, a,b), qbeta(x, a,b))
  f <- fx; f[fx == Inf] <- 1e100
  matplot(x, f, ylab="", type="l", ylim=ylim, asp=asp,
          main = sprintf("[dpq]beta(x, a=%g, b=%g)", a,b))
  abline(0,1,     col="gray", lty=3)
  abline(h = 0:1, col="gray", lty=3)
  legend("top", paste0(c("d","p","q"), "beta(x, a,b)"),
         col=1:3, lty=1:3, bty = "n")
  invisible(cbind(x, fx))
}

## change alpha

print("stable beta, increasing alpha")
pl.beta(5, 5)
pl.beta(8, 5)
pl.beta(10, 5)
pl.beta(12, 5)
pl.beta(18, 5)


## change beta
print("stable alpha, increasing beta")
pl.beta(5, 5)
pl.beta(5, 8)
pl.beta(5, 10)
pl.beta(5, 12)
pl.beta(5, 15)


```

### Models to write / run

Will probably do all per trial. Will do an early sensitivity check to confirm this.


1.  Single beta, no scaling   
2.  Single beta, no scaling per trial.   
*** At this point, compare the two above. Ensure the per trial fits better, and if it does then do all below per trial***   
3.  "" scaled 
4.  Single beta Single alpha reinforcement learning model (estimate both the beta and the alpha *i.e.* learning rate)
5.  Single beta single alpha reinforcement learning with mean + sd for the beta estimate as a parameter
6.  Beta per stimulus   
7.  Beta per stimulus + generalisation parameter (Vminus = vminus + wvplus)   
8.  Leaky beta   
9.  Leaky beta + uncertainty   
10.  Leaky beta + uncertainty + anxiety    

### Justification of model components

*Alpha*
Learning rate parameter. If high then will be very influenced by previous trial events, if low, then will be more standardly influenced by accumulating events.

* Single alpha per person 
    * Assumes that learning rate is a constant for each individual that might be scaled by other factors, such as certainty or sensitivity.

*Betas*
Variance/certainty parameter

* Single beta per person   
    *   Assumes that the general variance around ratings is the same regardless of stimulus. i.e. as much uncertainty for CS+ asn CS-
* Two beta's per person
    * Assumes that confidence / uncertainty might differ by stimulus. Presumably as a factor of reinforcement rate.




# Preliminary {.tabset}

### Compare a priori to data {.tabset}

#### Simulate different learning rates {.tabset}

only doing this 'accurately' for the acquisition CS+, as the simulations require probability. I am using contingency for this (0.75). If set for 0 for all other phases and stimuli then it looks as if the learning should be flat regardless of alpha. We expect in reality that this probability will vary between people and will be unlikely to be zero. So test 12 and 18 trials with a probability of 0.5 and 0.2 as well.

#####  12 trials; probability = 0.75

```{r leaqrning rate simulations,echo=F}

#this is a very basic script
#created by Alex Pike 14/02/19
#it simulates the learning of the value (Q) of a rewarded stimulus
#alpha is the learning rate
#ntrials is the number of trials in the task
#'outcome_probabilistic' generates outcomes probabilistically for ntrials
#'outcome_deterministic' always has the same outcome (1)



## klp ACQ CSp sims

n_trials = 12
probability = 0.75; #edit this to change the probability of reward

#klp - loop through some different learning rates

print("Simulated learning rates. 12 trials; probability = 0.75 (CSp acq contingency) \n")

for (a in seq(0:0.9,by=0.1)) {
  
  alpha <- a   #learning rate (between 0 and 1)
  
outcome_deterministic = rep(1,n_trials)
outcome_probabilistic = ifelse(runif(100)<probability,1,0)

Q=rep(0,n_trials)
par(mfrow=c(2,1))
for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_deterministic[t] - Q[t]);
}
plot(Q,type='l',col='blue',xlab='trial number')
title('Deterministic')

for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_probabilistic[t] - Q[t]);
}


plot(Q,type='l',col='red',xlab='trial number')
title('Probabilistic',sub=paste0("alpha = ",a,sep=" "))

}


```
#####  12 trials; probability = 0.5


```{r,echo=F}


n_trials = 12
probability = 0.5; #edit this to change the probability of reward

#klp - loop through some different learning rates

print("Simulated learning rates. 12 trials; Probability = 0.5\n")

for (a in seq(0:0.9,by=0.1)) {
  
  alpha <- a   #learning rate (between 0 and 1)
  
outcome_deterministic = rep(1,n_trials)
outcome_probabilistic = ifelse(runif(100)<probability,1,0)

Q=rep(0,n_trials)
par(mfrow=c(2,1))
for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_deterministic[t] - Q[t]);
}
plot(Q,type='l',col='blue',xlab='trial number')
title('Deterministic')

for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_probabilistic[t] - Q[t]);
}


plot(Q,type='l',col='red',xlab='trial number')
title('Probabilistic',sub=paste0("alpha = ",a,sep=" "))

}

```

#####  12 trials; probability = 0.2

```{r,echo=F}


n_trials = 12
probability = 0.2; #edit this to change the probability of reward

#klp - loop through some different learning rates

print("Simulated learning rates. 12 trials; Probability = 0.2\n")

for (a in seq(0:0.9,by=0.1)) {
  
  alpha <- a   #learning rate (between 0 and 1)
  
outcome_deterministic = rep(1,n_trials)
outcome_probabilistic = ifelse(runif(100)<probability,1,0)

Q=rep(0,n_trials)
par(mfrow=c(2,1))
for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_deterministic[t] - Q[t]);
}
plot(Q,type='l',col='blue',xlab='trial number')
title('Deterministic')

for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_probabilistic[t] - Q[t]);
}


plot(Q,type='l',col='red',xlab='trial number')
title('Probabilistic',sub=paste0("alpha = ",a,sep=" "))

}

```

#####  18 trials; probability = 0.5

```{r,echo=F}


n_trials = 18
probability = 0.5; #edit this to change the probability of reward

#klp - loop through some different learning rates

print("Simulated learning rates. 18 trials; Probability = 0.5\n")

for (a in seq(0:0.9,by=0.1)) {
  
  alpha <- a   #learning rate (between 0 and 1)
  
outcome_deterministic = rep(1,n_trials)
outcome_probabilistic = ifelse(runif(100)<probability,1,0)

Q=rep(0,n_trials)
par(mfrow=c(2,1))
for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_deterministic[t] - Q[t]);
}
plot(Q,type='l',col='blue',xlab='trial number')
title('Deterministic')

for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_probabilistic[t] - Q[t]);
}


plot(Q,type='l',col='red',xlab='trial number')
title('Probabilistic',sub=paste0("alpha = ",a,sep=" "))

}

```

#####  18 trials; probability = 0.2

```{r,echo=F}


n_trials = 18
probability = 0.2; #edit this to change the probability of reward

#klp - loop through some different learning rates

print("Simulated learning rates. 18 trials; Probability = 0.2\n")

for (a in seq(0:0.9,by=0.1)) {
  
  alpha <- a   #learning rate (between 0 and 1)
  
outcome_deterministic = rep(1,n_trials)
outcome_probabilistic = ifelse(runif(100)<probability,1,0)

Q=rep(0,n_trials)
par(mfrow=c(2,1))
for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_deterministic[t] - Q[t]);
}
plot(Q,type='l',col='blue',xlab='trial number')
title('Deterministic')

for (t in 1:(n_trials-1)){
  Q[t+1]=Q[t]+alpha*(outcome_probabilistic[t] - Q[t]);
}


plot(Q,type='l',col='red',xlab='trial number')
title('Probabilistic',sub=paste0("alpha = ",a,sep=" "))

}

```


#### Plot subset of trajectories in flare

```{r subset FLARe data,echo=F}

library(data.table)
library(ggplot2)
library(reshape2)
library(dplyr)

save <- "/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/LatentGrowth/Figures/"
dat <- fread("/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/LatentGrowth/Datasets/FC_t1_mplus_data.csv",data.table = F)

## randomly select 10 individuals.

set.seed(2016)
file <- sample_n(dat,10)


ap <- subset(file,select=c("Subject_ID","FCT1_1expCSp_1", 
             "FCT1_1expCSp_2", "FCT1_1expCSp_3", "FCT1_1expCSp_4", "FCT1_1expCSp_5", 
             "FCT1_1expCSp_6", "FCT1_1expCSp_7", "FCT1_1expCSp_8", "FCT1_1expCSp_9", 
             "FCT1_1expCSp_10", "FCT1_1expCSp_11", "FCT1_1expCSp_12"))

am <- subset(file,select=c("Subject_ID","FCT1_1expCSm_1", 
             "FCT1_1expCSm_2", "FCT1_1expCSm_3", "FCT1_1expCSm_4", "FCT1_1expCSm_5", 
             "FCT1_1expCSm_6", "FCT1_1expCSm_7", "FCT1_1expCSm_8", "FCT1_1expCSm_9", 
             "FCT1_1expCSm_10", "FCT1_1expCSm_11", "FCT1_1expCSm_12"))

ep <- subset(file,select=c("Subject_ID","FCT1_3expCSp_1", "FCT1_3expCSp_2", "FCT1_3expCSp_3", 
             "FCT1_3expCSp_4", "FCT1_3expCSp_5", "FCT1_3expCSp_6", "FCT1_3expCSp_7", 
             "FCT1_3expCSp_8", "FCT1_3expCSp_9", "FCT1_3expCSp_10", "FCT1_3expCSp_11", 
             "FCT1_3expCSp_12", "FCT1_3expCSp_13", "FCT1_3expCSp_14", "FCT1_3expCSp_15", 
             "FCT1_3expCSp_16", "FCT1_3expCSp_17", "FCT1_3expCSp_18"))

em <- subset(file,select=c("Subject_ID","FCT1_3expCSm_1", 
             "FCT1_3expCSm_2", "FCT1_3expCSm_3", "FCT1_3expCSm_4", "FCT1_3expCSm_5", 
             "FCT1_3expCSm_6", "FCT1_3expCSm_7", "FCT1_3expCSm_8", "FCT1_3expCSm_9", 
             "FCT1_3expCSm_10", "FCT1_3expCSm_11", "FCT1_3expCSm_12", "FCT1_3expCSm_13", 
             "FCT1_3expCSm_14", "FCT1_3expCSm_15", "FCT1_3expCSm_16", "FCT1_3expCSm_17", 
             "FCT1_3expCSm_18"))



## melt to longform

apmt <- melt(ap,
           id.var="Subject_ID")
apmt <- apmt[(order(apmt$Subject_ID)),]
apmt$Trial <- rep(1:12)

ammt <- melt(am,
           id.var="Subject_ID")
ammt <- ammt[(order(ammt$Subject_ID)),]
ammt$Trial <- rep(1:12)

epmt <- melt(ep,
           id.var="Subject_ID")
epmt <- epmt[(order(epmt$Subject_ID)),]
epmt$Trial <- rep(1:18)

emmt <- melt(em,
           id.var="Subject_ID")
emmt <- emmt[(order(emmt$Subject_ID)),]
emmt$Trial <- rep(1:18)

## plot lines and box plots for the subset

# acq CS+
acp <- ggplot(apmt,
              aes(Trial, value))            +
  geom_boxplot(aes(group=variable))         +
    geom_line(aes(group = Subject_ID,
                  color=Subject_ID))        +
  scale_color_gradientn(colors=rainbow(10)) +
  theme(legend.position = "none")           +
  ggtitle("CS+ acquisition")

# acq CS-
acm <- ggplot(ammt,
              aes(Trial, value))            +
  geom_boxplot(aes(group=variable))         +
    geom_line(aes(group = Subject_ID,
                  color=Subject_ID))        +
  scale_color_gradientn(colors=rainbow(10)) +
  theme(legend.position = "none")           +
  ggtitle("CS- acquisition")


# Ext CS+
exp <- ggplot(epmt,
              aes(Trial, value))            +
  geom_boxplot(aes(group=variable))         +
    geom_line(aes(group = Subject_ID,
                  color=Subject_ID))        +
  scale_color_gradientn(colors=rainbow(10)) +
  theme(legend.position = "none")           +
  ggtitle("CS+ Extinction")

# Ext CS-
exm <- ggplot(emmt,
              aes(Trial, value))            +
  geom_boxplot(aes(group=variable))         +
    geom_line(aes(group = Subject_ID,
                  color=Subject_ID))        +
  scale_color_gradientn(colors=rainbow(10)) +
  theme(legend.position = "none")           +
  ggtitle("CS- Extinction")

acp
acm
exp
exm
```


### Try RStan

See if the basic punishment only learning model for the CS+ and CS- works with the FLARe master data

#### Run the 8schools check

From the [rstan github](https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started)

This is to check that all is compiling and working and to give and idea of data format etc.



#### Set up procedure to create and sync models.

This directs to my local machine here ***/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/Hierachal_modelling/Scripts*** and is remotely linked to the github [repository here](https://github.com/klpurves/FLARe_Bayesian_hierarchical).  

##### Make sure the most up to date stan file is in the remote repo


```{bash update from git}

git pull Bayes_modelling
   
```



# Analyses

### Function block


```{r call libraries, echo=F,results=F}

#rm(list=ls())

library(tibble)
library(psych)
library(tidyverse)


# directories
workingdir='/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/Hierachal_modelling/Modelling'
scriptdir='/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/Hierachal_modelling/Scripts'
datadir='/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/LatentGrowth/Datasets/'

## write function to set each block as test or not individually.
## three testing levels - min, med, max. 
## if off, then will set to normal stan parameter of 1000 warmup and 4000 iterations on 4 chains


```


#### test function

A function for running minimal, medium or maximal tests of the stan data. This changes how many chains and iterations are run 

```{r testing stan}
testing <- function(x) {
  
  if (x %in% c('min',"Min")) {
    chain_iter<<-400
    warm_up<<-100
    chain_n<<-1
  } else if ( x %in% c('med' ,'Med')) {
    
    chain_iter<<-1000
    warm_up<<-500
    chain_n<<-1
    
  } else if ( x %in% c('max','Max')) {
    
    chain_iter<<-2000
    warm_up<<-1000
    chain_n<<-2
    
  } else if ( x %in% c('full' ,'Full')) {
    
    chain_iter<<-4000
    warm_up<<-1000
    chain_n<<-4
    
  } else if ( x %in% c('skip' ,'Skip')) {
    
    chain_iter<<-0
    warm_up<<-0
    chain_n<<-0
    
  } 

}
```

#### Model run, load or skip


A function for either running the model, loading in the data if it already exists and doesnt need redoing, or skipping block entirely.

x is testing command, skip or load commans
y is stan script
z is flare_data set to use.

*Note that this needs to have scriptdir and datadir existing in the workplace*

```{r run or load}

model_run <- function(x,y,z) {

  if (x %in% c('skip',"Skip")) {
    
    stop("skipping this model")
    
  }
    
  if (x %in% c('min',"med","max","Min","Med","Max","full","Full")) {
    
  print("running model")
  
  testing(x)
  stanname = y
  stanfile <- file.path(scriptdir, stanname)
  flare_data <- z
  # note that flare_data is set up elsewhere (see block below)
  flare_fit <- stan(file = stanfile, data = flare_data, iter=chain_iter, chains = chain_n) #add working dir?
  save_name <- gsub(".stan",".rds", stanname)
  saveRDS(flare_fit, file=file.path(datadir,save_name))
  
  print(traceplot(flare_fit,'lp__'))
  
  # extract fit data
  return(summary(flare_fit))
  
  } 
  
  if (x %in% c('load',"Load")) {
    
    print("Loading existing model fit data")
    stanname = gsub(".stan",".rds", y)
    fitfile <- readRDS(file=paste0(datadir,stanname))
    print(traceplot(fitfile,'lp__'))
    return(summary(fitfile))
    
    
    
  }
}

```

#### out describe


Function for describing the mean etc of freely estimated parameters from STAN output

```{r stan out descriptives}

out_describe<- function(summary,n,all = NULL){
  
library(dplyr)

  print(paste0(chain_iter, " iterations ", ' on ', chain_n,' chains.',sep=" "))

  print(paste("Estimated",(dim(summary$summary)[1]-1) / nsub,"Free paramaters per person",sep=" "))

  summary <- data.frame(summary$summary[(1:(dim(summary$summary)-1)[1]),])

table <- summary %>%
  mutate(parameter = rep(1:(dim(summary)[1]/n),each = n )) %>%
  group_by(parameter) %>%
  summarize(mean = mean(mean,na.rm=T), 
            se_mean = mean(se_mean,na.rm=), 
            sd = mean(sd,na.rm=T),
            Rhat = mean(Rhat,na.rm=T))

param_names <- row.names(summary)[(seq(1,dim(summary)[1],n))]

table$parameter <- param_names

if (is.null(all) & dim(table)[1] > 10) {
  
  print("This table is very large. Returning only the top 6 entries unless you have set the 3rd function option to 'all'. ")
  return(head(table))
  
} else {
  
  return(table)
}

}


```

#### BIC

Canonincal BIC function from log likelihood courtesy of Alex Pike

```{r canonical BIC}
## canonical BIC function (Alex Pikes)

bic<-function(trials,neg_log_like,nparam) {
  if (sum(neg_log_like<0)>0){print('check this is negative log likelihood!!')} 
  2*neg_log_like+nparam*log(trials) #canonical
}


```


#### model plot

a function for plotting BIC barchart for different models contained in a dataset

```{r compare models}
## model compare plot function

plot_models <- function(dataset) {
  
dataset$BIC <- odp(as.numeric(dataset$BIC))

dataset <- as.data.frame(na.omit(dataset))

yminv <- min(dataset$BIC,na.rm=T) -5
ymaxv <- max(dataset$BIC,na.rm=T) +5

plot <- ggplot(dataset,aes(x=reorder(model,BIC),y=BIC)) +
  geom_bar(stat="identity") +
  coord_flip() +
  labs(title = "Model comparison",
       y="Bayesian Information Criterion (BIC)") +
  scale_y_continuous(limits=c(yminv,ymaxv),oob=rescale_none)

show(plot)

}




```


#### ODP
the below is a function that will format your numbers to one decimal place using sprintf

```{r one decimal place function}

odp <- function(x) {
  as.numeric(sprintf("%2.2f",x))
}

```



## Create datasets

### notes

We need to rescale our dataset here to be between 0 and 1. 

Importantly, because we are using the proportion of trials that **are not** reinforced as a known parameter for statistical reasons (we don't want a proportion of .75 and 1, better to have .25 and 0), we have made our rescaled expectancy values as 1 - rescaled(x). This means that we will still be able to interpret the results in the expected way (i.e. higher rating is greater expectation of the outcome).


### Expectancy data

load in the week 1 app and lab data for FLARe pilot, TRT and headphones studies. Make it long form.

Try with acquisition data first. This is formatted with no column names, with no missing data.

Derivethe n parameter for both files and check these match


#### set up trial number

```{r trial number}
# create the n trials variable for RStan
ntrials=12
```



```{r}

stanname='punish_only.stan'
minus_name <- 'bayes_acq_minus.csv'
plus_name <- "bayes_acq_plus.csv"

stanfile <- file.path(scriptdir, stanname)
minusfile <- file.path(datadir,minus_name)
plusfile <- file.path(datadir,plus_name)


minus <- fread(minusfile,data.table=F)
plus <-fread(plusfile,data.table=F)

nacqm <- dim(minus)[1]
nacqp <- dim(plus)[1]


## check that these match and create nsub variable for RStan

if (nacqm == nacqp) {
  print('subject number match')
  nsub <- nacqm
  
  print(paste('nsub set to',nsub,sep=" "))
} else {
  print('WARNING: subject number does not match. Check master dataset')
}

# check the file format is ok

minus[1:2,]
plus[1:2,]



```

The expectancy rating datasets look like they are formatted fine and ntrials and nsub variables should exist. 

#### make rating data binary


for now to see if stan runs using bernoulli-logit function make binary resposnes from expectancy i.e. >=4.5 ==1, <= 4.5 ==0. 



```{r}

binarise <- function(x) {
  ifelse(x >= 4.5,1,0)
}

```

```{r}


minusb <- data.frame(apply(minus,2,function(x) binarise(x)))

plusb <- data.frame(apply(plus,2,function(x) binarise(x)))

```


### Proportion screams data

This is a vector containing the absolute number of trials where no scream occurred for each stimulus. As there was a 75% reinforcement rate for the CS+ (9/12 trials), this is a vector of '3's. For the CS-, no trials were reinforced so is a vector of '12's

```{r}

No_scream_p <- rep(3,nsub)
No_scream_m <- rep(12,nsub)

```

### Scream per trial data

Create datasets for the acquisition CS- and extinction CS+ and CS- reflecting that no screams occurred at all. Then use the pattern id variable to create a dataset for the acquisition CS+ indicating when a scream occurred for each participant.
```{r}


## Create the no scream daatsets for all
screamMinus <- matrix(0L,nrow=nsub, ncol=ntrials)

library(data.table)

## read in the screams for acquisition 
screamPlus <- fread("/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/LatentGrowth/Datasets/bayes_screams_acq.csv",data.table=F)


# 
# ### for the time being, simulate the NA data for the studies where I havent yet finished cleaning the screams.
# 
# # make the first trial 1 for everyone, then add 8 additional random 1's per person. Do this in four random patterns to mimic the real data
# 
# sc1 <- c(1,1,0,1,0,0,1,1,1,1,1)
# sc2 <- c(0,1,1,1,0,0,1,1,1,1,1)
# sc3 <- c(1,1,1,0,1,0,1,0,1,1,1)
# sc4 <- c(1,0,1,1,0,0,1,1,1,1,1)
# 
# 
# screamPlus[,1] <- ifelse(is.na(screamPlus[,1]),1,screamPlus[,1])
# 
# # for (n in 1:dim(screamPlus)[1]) {
# #   print(n)
# #   screamPlus[n,2:12] <- sample(patts,1,replace=T)
# # }
# 
# for (n in 1:dim(screamPlus)[1]) {
#   
#   a <- sample(c(1,4),1)
#   
#   if (is.na(screamPlus[n,2])){
#     if (a == 1) {
#       screamPlus[n,2:12] <- sc1
#     } else if (a == 2) {
#       screamPlus[n,2:12] <- sc2
#     } else if (a == 3){
#       screamPlus[n,2:12] <- sc3
#     } else {
#       screamPlus[n,2:12] <- sc4
#     }
#   }
# }

```



## Create dataset for barplot comparing output

```{r}
library(ggplot2)

mod_comp <- data.frame(model=NA,BIC=NA)

```

### rescale data

rescale the 1-9 expectancy values to be on a 0-1 scale.

stan cannot deal with the extreme limit of the beta, so make the rescaled limits just above 0 and below one

Note that when a value had to be imputed as it was missing it will not be an integer. Thus the function needs to allow for ranges between values.

```{r}

library(scales)

# rescale and flip so that we are effectively rating the expectation that they WILL NOT hear a scream to match stan

## rescaling such that the distribution spaces the numbers 1-9 evenly. the first interval upper bound would be 0.11, then 0.22 etc. this means that the mid point of each itnerval will be:

print("mid point of each evenly spaced interval representing values between 1-9")
seq(0.5/9,1,1/9)


## thus 1 will be 1-0.055 etc.

## NOTE: might want to consider making this more flexible. enter in the numer of choice options as a variable - would be very easy. add to function library at later stage

scale_flare <- function(x){
  
  vals <- seq(0.5/9,1,1/9)
  
  for (val in 1:9){
    if (x > val-1 & x <= val){
      x <- 1 - vals[val]
    }
  }
  return(x)
}


## initialise minus_scaled dataframe.

minus_scaled <- data.frame(matrix(ncol=dim(minus)[2],nrow = dim(minus)[1]))

##  populate with rexcaled values

for (sub in 1:dim(minus)[1]){
  for (col in 1:dim(minus)[2]){
    
    minus_scaled[sub,col] <- scale_flare(minus[sub,col])
  }
}

## ditto for plus

plus_scaled <- data.frame(matrix(ncol=dim(minus)[2],nrow = dim(minus)[1]))

for (sub in 1:dim(plus)[1]){
  for (col in 1:dim(plus)[2]){
    
    plus_scaled[sub,col] <- scale_flare(plus[sub,col])
  }
}



## this is the number that will take from the midpoint to the top and bottom for the new boundaries (with ratings representing the midpoint)

cdf_scale <- 1/18




```


## Set up stan 


These use Alex Pikes RStan script with minor modification to make it punishment only to see if it runs. Testing that the approach works with the current data set up etc.  

The settings for the script are below, including stan chain parameters and directory set up.



This loads the libraries and source files needed to run this script, and sets up RStan

```{r, echo = F,results='hide',message=F}

# libraries and source files 
library('MASS')
library('boot')
library('dplyr')
library('reshape')
library('tidyr')
library('rstan') 
library('loo')    # this is model comparison package. helps extract loglikelihood too.
library('data.table')

#options for RSTAN
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
Sys.getenv('LOCAL_CPPFLAGS') #should say '-march=native'

#functions (if and when relevant and added)
# source('/Users/kirstin/Dropbox/SGDP/Function_library/<<function script name>>')
source('/Users/kirstin/Dropbox/SGDP/Function_library/not_in.R') # Not in %!in% function

```

## Stan data

```{r stan data}

## Test data (Pilot + TRT + Validation) proportion no screams


#data
data_files<-list(ntrials=ntrials,nsub=nsub,nothingPlus = No_scream_p, nothingMinus=No_scream_m,ratingsPlus=plus_scaled,ratingsMinus=minus_scaled)


## Test data (Pilot + TRT + Validation) proportion screams, no log likelihood

flare_data_nolog <-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(plus_scaled),ratingsMinus=t(minus_scaled))

## Test data (Pilot + TRT + Validation) proportion screams, no log likelihood
flare_data<-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus=t(screamMinus),ratingsPlus=t(plus_scaled),ratingsMinus=t(minus_scaled),cdf_scale=cdf_scale)


## Validation data (TEDS) scaled 
# 
# TEDS_data<-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(plus_scaled),ratingsMinus=t(minus_scaled))

```

# Baseline models

## Model 1: single beta no scaling {.tabset}

### notes

Because we use the 1-rescaled expectancy data, no need to try and invert to reinforcement parameters here. As a result we need the stan model to simply be:   

```
alphaPlus[p] =  nothingPlus[p]/ntrials;
alphaMinus[p] =  nothingMinus[p]/ntrials;
```

### run Alex Pike's stan script for non scaled beta model.

here we try to estimate the alpha parameter of the beta distribution per trial per person per stimulus. (i.e. you have two sufficient parameters for each beta dist, the alpha and beta. we want to estimate the alpha - ). 

Eventually we will scale these by the actual 'value' of the scream for each person per trial. 

Using data loaded in from preliminary tests above.

so this is a beta value per person (assuming the underlying process for the plus and minus are the same)

### Model

```{r no scaling set up}
#script
stanname='beta_noscaling.stan'

#data
data_files<-list(ntrials=ntrials,nsub=nsub,nothingPlus = No_scream_p, nothingMinus=No_scream_m,ratingsPlus=plus_scaled,ratingsMinus=minus_scaled)

```

```{r beta no scaling}

flare_fit <- model_run('load','beta_noscaling.stan',data_files)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


## Model 2: single beta scaled {.tabset}

### notes

Simple alteration of the first model. We estimate a scaling parameter per person over all trials and apply this to alpha component per participant.

### run Alex Pike's stan script for scaled beta model.

here we try to estimate the alpha parameter of the beta distribution per trial per person per stimulus. (i.e. you have two sufficient parameters for each beta dist, the alpha and beta. we want to estimate the alpha - ). 

Eventually we will scale these by the actual 'value' of the scream for each person per trial. 

Using data loaded in from preliminary tests above.

so this is a beta value per person (assuming the underlying process for the plus and minus are the same)

### Model

```{r scaled  set up,echo=F}

#script
stanname='beta_scaling.stan'

```

```{r beta scaled}

flare_fit <- model_run('load',stanname,data_files)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


### run Alex Pike's stan script for scaled beta model.

here we try to estimate the alpha parameter of the beta distribution per trial per person per stimulus. (i.e. you have two sufficient parameters for each beta dist, the alpha and beta. we want to estimate the alpha - ). 

Eventually we will scale these by the actual 'value' of the scream for each person per trial. 

Using data loaded in from preliminary tests above.

so this is a beta value per person (assuming the underlying process for the plus and minus are the same)


```{r beta RL set up}
#script
stanname='beta_withRL.stan'

```

```{r beta RL}

flare_fit <- model_run('load',stanname,flare_data_nolog)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


## Model 3: RL, mean defined, single beta {.tabset}

### notes

this model includes an alpha learning paramater per person estimating their learning rate and updating based on it. This model needs a dataset that indicates whether a scream occurred for each trial instead of the proportion of times no scream occurred.


### Mean to define shape

this model includes an alpha learning paramater per person estimating their learning rate and updating based on it. This model needs a dataset that indicates whether a scream occurred for each trial instead of the proportion of times no scream occurred.

Alex used this [stack post](https://stats.stackexchange.com/questions/12232/calculating-the-parameters-of-a-beta-distribution-using-the-mean-and-variance) to help solve the shape parameters using mean and sd where we assume that v serves as the mean and beta as the sd.

the equations work out to this:

for shape 1: 


$$\alpha = \left(\frac{1-\mu}{\sigma^2} - \frac{1}{\mu}\right)\mu^2$$

for shape 2: 

$$\beta=\alpha \left(\frac{1}{\mu}-1\right)$$
### Model

this way of defining the mean does not work or even run, so skipping it.

```{r mean 1 set up}
#script
stanname='beta_meansd_RL.stan'

```


```{r mean 1}

flare_fit <- model_run('skip',stanname,flare_data_nolog)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


On 500 iterations (i.e. test) the variance in alpha is good, but the traceplot is terrible. Model coverges very poorly. We also have to constrain the beta to be betwqeen 0 and 0.0001. Not sure why this is. 

when running for 2000 iterations (1000 warmup)...

This results in the following warning;

>> There were 2644 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmupThere were 4 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceededThere were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

### Mean definition 2

The above mean definition does not map the data well (terrible traceplot!). I found [this from the MRC BSU](https://www.mrc-bsu.cam.ac.uk/wp-content/uploads/bugsbook_chapter5.pdf) and have tried defining the beta parameters assuming V == mean in a slighty different way:

for paramater a:

$$\alpha = \mu\beta/(1-\mu)$$

for parameter b:

$$\beta = \mu(1-\mu)^2/\sigma+\mu-1$$

### Model

Still using a single beta here.

skipping as it also does not run

```{r mean 2 set up}
#script
stanname='beta_meansd_RL_2.stan'

```


```{r mean 2}

flare_fit <- model_run('skip',stanname,flare_data_nolog)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


### Mean definition 3

noted that the shape parameters have  slight variations in definition according to discussion [here](https://stats.stackexchange.com/questions/12232/calculating-the-parameters-of-a-beta-distribution-using-the-mean-and-variance). Updated the script slightly to reflect this based on the reply from ocram.

the first sd term in shape a is changed to variance, so it changes from:   

$$\alpha = \left(\frac{1-\mu}{\sigma^2} - \frac{1}{\mu}\right)\mu^2$$

to 

$$\alpha = \left(\frac{1-\mu}{\sigma} - \frac{1}{\mu}\right)\mu^2$$


Changes the shape 2 parameter definition from:

$$\beta=\alpha \left(\frac{1}{\mu}-1\right)$$   

to 


$$\beta = \left(\frac{1-\mu}{\sigma} - \frac{1}{\mu}\right)\mu\left(1-\mu\right)$$

Because this works best, will add loglikelihhod calculation here. Basing this on the probability density function for the beta distribution given the participants actual ratings and sufficient parameters of the distribution per trial.

>>>loglik[p] =  loglik[p] +
         beta_lpdf(ratingsPlus[t,p]|shape1_Plus[t,p],shape2_Plus[t,p]) +
         beta_lpdf(ratingsMinus[t,p]|shape1_Minus[t,p],shape2_Minus[t,p])
         
         
### Model
         
Skip as this definition also does not work or run


```{r mean 3 set up}
#script
stanname='beta_meansd_RL_3.stan'

```


```{r mean 3}

flare_fit <- model_run('skip',stanname,flare_data_nolog)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


This model is substnatially better than either of the other two. Traceplot suggests that the iterations converge as we would like. However, we still need to massively constrain the beta (i.e. confidence / uncertainty) estimates for it to run, otherwise the starting values drop below zero. 


### Mean definition 4

Here I try to define the parameter using simplified mean and precision estimates as per [this tutorial](http://quantdevel.com/public/CSP2017/ModelingProportionsAndProbabilities.pdf). See in particular the parameter estimation on the cubs data.


This results in a relatively simplified parameter estimation compared to model 3. 

$$\alpha = \mu * ((\mu * (1-\mu)) / \sigma - 1)$$

where mu is the mean (or value) and sigma is the variance / uncertainty parameter we currently call beta.

and the b (or shape 2) parameter for the distribution is:

$$\beta = (1- \mu) * ((\mu * (1-\mu)) / \sigma - 1)$$

### Model

```{r mean 4 set up}
#script
stanname='beta_meansd_RL_4.stan'

```


```{r mean 4}

flare_fit <- model_run('load',stanname,flare_data)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```



#### Create BIC from log likelihood

```{r}
## extract log likelihood

flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)

#calculate BIC

FLARe_bic<-bic(ntrials,-colMeans(flare_loglike_best),2) #number of parameters in that model e.g. 4)

## mean BIC as model comparisons tool:

print("Mean Bayesian information criterion for model")
mean(FLARe_bic)

```


#### Add to bar plot

```{r}

mod_comp <- rbind(mod_comp,c("Means 1 beta",as.numeric(mean(FLARe_bic))))

mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))

mod_comp <- as.data.frame(na.omit(mod_comp))

## plot function - create plot
plot_models(mod_comp)

```


## Model 4: RL, mode defined, single beta {.tabset}

### notes 

Used [this post](http://doingbayesiandataanalysis.blogspot.com/2012/06/beta-distribution-parameterized-by-mode.html) to guide this. particularly:

>>For a beta distribution with shape parameters a and b, the mode is (a-1)/(a+b-2). Suppose we have a desired mode, and we want to determine the corresponding shape parameters. Here's the solution. First, we express the "certainty" of the estimate in terms of the equivalent prior sample size,
k=a+b, with k≥2. 
The certainty must be at least 2 because it essentially assumes that the prior contains at least one "head" and one "tail," which is to say that we know each outcome is at least possible. Then a little algebra reveals:
a = mode * (k-2) + 1
b = (1-mode) * (k-2) + 1



### shape 1 as mode with v and beta as beta shape parameters 

For this version we try and estimate the 'mode' to be shape 1. KIRSTIN:: explain here

### Model

doesnt work, so skip this first attempt

```{r mode 1 set up}
#script
stanname='beta_mode_RL.stan'

```


```{r mode 1}

flare_fit <- model_run('skip',stanname,flare_data_nolog)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```

### v as mode

For this version we assume that V is the mode (above we assumed it serves as the mean) and beta is the certainty aspect (i.e. k)

What this does is basically treat the expected rating (value) as the a parameter for the distribution (scaled by their certainity - beta) and 1-that value as the b parameter (again, scaled by the uncertainty). 

so you have a ratio of their selected value per trial (mode across iterations?) to how far from the highest possible choice they are.

### Model

```{r mode 2 set up}
#script
stanname='beta_mode_RL_2.stan'

```


```{r mode 2}

flare_fit <- model_run('med',stanname,flare_data)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


This works, but there is not a lot of variance in the alpha parameter when described by mode mean 0.49; sd = 0.06. Compared to defined by mean where mean is 0.54 and sd is 0.26.

However there is a lot of variation in the beta parameter (mean -7.21, sd = 134.74)

#### Create BIC from log likelihood

```{r}
## extract log likelihood

flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)

#calculate BIC

FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)

## mean BIC as model comparisons tool:

print("Mean Bayesian information criterion for model")
mean(FLARe_bic)

```


#### Add to bar plot

```{r}

mod_comp <- rbind(mod_comp,c("Mode 1 beta",as.numeric(mean(FLARe_bic))))

mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))

mod_comp <- as.data.frame(na.omit(mod_comp))


## plot function - create plot
plot_models(mod_comp)

```


## Model 5: RL mean defined,two beta {.tabset}

### Model

RL model adding a beta per stimulus to Alex's model

```{r mean 2 beta set up}
#script
stanname='beta_meansd_2beta_RL.stan'

```


```{r mean 2 beta}

flare_fit <- model_run('full',stanname,flare_data)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```

#### Create BIC from log likelihood

```{r mean 2 beta bic}
## extract log likelihood

flare_loglike_m2 <- extract_log_lik(flare_fit_m2, parameter_name = "loglik", merge_chains = TRUE)

#calculate BIC

FLARe_bic_m2 <- bic(ntrials,-colMeans(flare_loglike_m2),3) #number of parameters in that model e.g. 4)

# mean for all participants

mean(FLARe_bic_m2)

```


#### Add to bar plot

```{r}

mod_comp <- rbind(mod_comp,c("Means 2 beta",mean(FLARe_bic_m2)))


mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))

mod_comp <- as.data.frame(na.omit(mod_comp))


## plot function - create plot
plot_models(mod_comp)

```


## Model 6: RL mode defined,two beta {.tabset}

### Model
RL model adding a beta per stimuli to model defining the beta shape using the mode instead of the mean. This definitely makes more sense as we assume that they will have different levels of uncertainty about each. 


```{r mode 2 beta set up}
#script
stanname='beta_mode_2beta_RL_2.stan'

```


```{r mode 2 beta}

flare_fit <- model_run('full',stanname,flare_data_nolog)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


The alpha parameter variance is normal (mean 0.4 and sd 0.12). Beta is much more bounded now though (combined across both stimuli mean 0.79, sd=1.6) over 4000 iterations on 4 chains.


#### Create BIC from log likelihood

```{r}
## extract log likelihood

flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)

#calculate BIC

FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)

## mean BIC as model comparisons tool:

print("Mean Bayesian information criterion for model")
mean(FLARe_bic)

```

#### Add to bar plot

```{r}

mod_comp <- rbind(na.omit(mod_comp),c("Mode 2 beta",mean(FLARe_bic)))

mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))

mod_comp <- as.data.frame(na.omit(mod_comp))


## plot function - create plot
plot_models(mod_comp)

```

## Model 7: RL mean defined, no beta {.tabset}

### Model

The beta doesnt work as well for the CS+ stimulus, need to check if this parameter adds anything to the model - drop it from our best mean model and see how this changes the fit.

this takes for ever to run and the logliklihood fails. So no idea if it is good yet - come back to this. ***skipped for now***


```{r mean NO beta set up}
#script
stanname='beta_meansd_RL_NoBeta.stan'

```


```{r mean No beta}

flare_fit <- model_run('skip',stanname,flare_data_nolog)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


#### Create BIC from log likelihood

```{r}
# 
# ## extract log likelihood
# 
# flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)
# 
# #calculate BIC
# 
# FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)
# 
# ## mean BIC as model comparisons tool:
# 
# print("Mean Bayesian information criterion for model")
# mean(FLARe_bic)

```

#### Add to bar plot

```{r}
# 
# mod_comp <- rbind(na.omit(mod_comp),c("Mean no beta",mean(FLARe_bic)))
# 
# #
# mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))
# 
# mod_comp <- as.data.frame(na.omit(mod_comp))
# 
## plot function - create plot
# plot_models(mod_comp)

```

# Generate and recover 

Here I test whether the model is working well by seeing if I can use the parameters we've estimated to try and generate our existing rating data and then recover similar parameters again.

I will do this for the best fitting model (mean defined beta distribution with a variance estimate per person for eahc stimulus) This is the model where we treat the iterarted ratings as if they are 'expected' values and use this as shape 1 parameter for our beta distribution at each trial. We have allowed a beta (or uncertainty) parameter per stimulus.

A good model will have **a)** a good correlation between real data and the data generated and **b)** a good correlation between the parameter estimates from the real and generated data.

We basically want to replicate our stan script, but inatead of estimating paratmers, we want to assume that we know what the parameters are (i.e. use the alpha and beta's we have estimated previously).

**update**: turns out single beta is the best fitting model when I correct my BIC function to include the negative log likelihood. So will also generate and recover for this model and use this as the comparator.


## Mean 1 beta
### Generate 
##### Make alpha / beta datasets p/p

Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.

```{r}

params <- summary(flare_fit_best)
alpha_est <- data.frame(params$summary[1:nsub,1])
beta <- data.frame(params$summary[(nsub+1):(nsub*2),1])


names(beta) <- "beta"

```

##### Initialise empty datasets to hold the predicted ratings

```{r}

rating_est_plus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
rating_est_minus <- data.frame(matrix(ncol=ntrials,nrow=nsub))

# beta shape parameters
shape1p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape1m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2m <- data.frame(matrix(ncol=ntrials,nrow=nsub))

# V parameters (initialised at 0.5)
vp <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vm <- data.frame(matrix(ncol=ntrials,nrow=nsub))

vp[1] <- 0.5
vm[1] <- 0.5

# prediction error 

dp <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub)) 
dm <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub)) 

  
```

##### Simulate ratings

Use our extracted parameters in place of estimating the same. Use the stan syntax


######  Populate our vplus and delta frames

use the alpha parameters we've extracted (alpha_est)
d == delta (precdiction error)
v == value (i.e. value for each stimulus)


```{r}

for (p in 1:nsub){
  for (t in 1:(ntrials-1)){
      dp[p,t] <- screamPlus[p,t]-vp[p,t]
      dm[p,t] <- screamMinus[p,t]-vm[p,t]
      vp[p,t+1] <- vp[p,t]+alpha_est[p,1]*dp[p,t]
      vm[p,t+1]<- vm[p,t]+alpha_est[p,1]*dm[p,t]
    }
}

```

    for (t in 1:ntrials){
      shape1_Plus[t,p] = VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
      shape1_Minus[t,p] = VMinus[t,p] * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
      shape2_Plus[t,p] = (1-VPlus[t,p]) * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
      shape2_Minus[t,p] = (1-VMinus[t,p]) * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);

      ratingsPlus[t,p] ~ beta(shape1_Plus[t,p],shape2_Plus[t,p]);
      ratingsMinus[t,p] ~ beta(shape1_Minus[t,p],shape2_Minus[t,p]);
    }
  }
}


###### populate beta parameter shape frames

Use the new v frames and beta parameters.

Shape 1 and 2 are sufficient parameters for the beta distribution

```{r}

for (p in 1:nsub){
  
  for (t in 1:ntrials){
    
      shape1p[p,t] = vp[p,t] * ((vp[p,t] * (1-vp[p,t])) / beta[p,1])
      shape1m[p,t] = vm[p,t] * ((vm[p,t] * (1-vm[p,t])) / beta[p,1])
      shape2p[p,t] = (1-vp[p,t]) * ((vp[p,t] * (1-vp[p,t])) / beta[p,1])
      shape2m[p,t] = (1-vm[p,t]) * ((vm[p,t] * (1-vm[p,t])) / beta[p,1])
      
  }
}
  

```


###### Estimate ratings


trying to use pbeta here (derives the distribution function givemn a set of probabilities) 

For now, setting probabilities between 0 and 1 and taking the average...
```{r}


for (p in 1:nsub){
  
  for (t in 1:ntrials){
    
    rating_est_plus[p,t] <- mean(rbeta(1000,shape1p[p,t],shape2p[p,t]))
    rating_est_minus[p,t] <- mean(rbeta(1000,shape1m[p,t],shape2m[p,t]))
    
  }
}


```

#### Rescale simulated ratings

You could argue that these should match the discrete scale nature of the original ratings. We effectively undid this in our script. The following will enable this.

HOWEVER: we are redcucing variance massively this way, so think it might be better to leave the recovered ratings unscales....

So - the following discrete values exist in our rescaled ratings:
```{r}

table(plus_scaled$X1)

```

Will make it that anything that falls 0.05555556 above or below one of these values is set to this median point. Note that this is our cdf_scale factor that we used in the script to capture the full area under the curve for each segment of the distribution represented by the discrete ratings of 1-9.

Write the function to rescale

```{r}


scale_simulated <- function(x){
  
  scaled_list <- array(unique(plus_scaled$X1))
  
  for (val in scaled_list[1:length(scaled_list)]){
    if (x > val-cdf_scale & x < val+cdf_scale){
      
      x <- val
    }
  }
  return(x)
}

```

apply it to the simulated rating frames.

(unhash to run this)

```{r}
## initialise dataframes
# 
# est_plus_scaled <- data.frame(matrix(ncol=dim(rating_est_plus)[2],nrow = dim(rating_est_plus)[1]))
# est_minus_scaled <- data.frame(matrix(ncol=dim(rating_est_minus)[2],nrow = dim(rating_est_minus)[1]))
# 
# ##  populate with rescaled values
# 
# for (sub in 1:dim(rating_est_plus)[1]){
#   for (col in 1:dim(rating_est_plus)[2]){
#     
#     est_plus_scaled[sub,col] <- scale_simulated(rating_est_plus[sub,col])
#   }
# }
# 
# for (sub in 1:dim(rating_est_minus)[1]){
#   for (col in 1:dim(rating_est_minus)[2]){
#     
#     est_minus_scaled[sub,col] <- scale_simulated(rating_est_minus[sub,col])
#   }
# }


```

#### Correlate actual ratings with simulated ratings

use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings...

Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.

this will be using either rating_est files (rating_est_plus;rating_est_minus) or the est_scaled files (est_minus_scaled; est_plus_scaled) depending on whether we opt to return scaling or no 

```{r}

print("real ratings with estimated ratings: CS MINUS")
diag(corr.test(rating_est_minus,minus_scaled)$r)

print("real ratings with estimated ratings: CS MINUS (average for all trials)")
cor(rowMeans(rating_est_minus),rowMeans(minus_scaled))


print("real ratings with estimated ratings: CS PLUS")
diag(corr.test(rating_est_plus,plus_scaled)$r)

print("real ratings with estimated ratings: CS PLUS (average for all trials)")
cor(rowMeans(rating_est_plus),rowMeans(plus_scaled))

```

### Recover

Here we are seeing if we can recover the same estimates using the simulated ratings. Basically run stan but using the estimated ratings instead of the real ones. See if we get the same alpha / beta parameters.

We might decide to use the rescaled estimates here to be more comparable...

#### run stan model

RL model adding a beta per stimulus to Alex's model


```{r mean recover set up}
#script
stanname='beta_meansd_RL_4.stan'

# data 
flare_data_rec <-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(rating_est_plus),ratingsMinus=t(rating_est_minus),cdf_scale=cdf_scale)


```


```{r mean recover}

flare_fit_rec <- model_run('full',stanname,flare_data_rec)

## get some basic output descriptions printed to screen
out_describe(flare_fit_rec,nsub)

```


#### Make alpha / beta datasets p/p

Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.

```{r}

params_rec <- summary(flare_fit_rec)
alpha_est_rec <- data.frame(params_rec$summary[1:nsub,1])
beta_rec <- data.frame(params_rec$summary[(nsub+1):(nsub*2)])

names(beta_rec) <- "beta_rec"



```

#### Correlate actual ratings with simulated ratings

use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings...

Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.

```{r}

print("original with recovered: ALPHA")
diag(corr.test(alpha_est_rec,alpha_est)$r)

print("original with recovered: BETA")
diag(corr.test(beta_rec,beta)$r)



```


## Mean 2 beta
### Generate 
##### Make alpha / beta datasets p/p

Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.

```{r}

params <- summary(flare_fit_m2)
alpha_est <- data.frame(params$summary[1:nsub,1])
beta_plus <- data.frame(matrix(ncol = 1,nrow=nsub))
beta_minus <- data.frame(matrix(ncol = 1,nrow=nsub))

names(beta_plus) <- "beta_plus"
names(beta_minus) <- "beta_minus"
  
subp = 0
subm = 0
  for ( i in 343:1026){
    
    if (i%%2 == 1){
      subp= subp+1
      beta_plus[subp,1] <- params$summary[i,1]
    } else if (i%%2 == 0) {
      subm= subm+1
      beta_minus[subm,1] <- params$summary[i,1] 
    }
    
}


```

##### Initialise empty datasets to hold the predicted ratings

```{r}

rating_est_plus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
rating_est_minus <- data.frame(matrix(ncol=ntrials,nrow=nsub))

# beta shape parameters
shape1p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape1m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2m <- data.frame(matrix(ncol=ntrials,nrow=nsub))

# V parameters (initialised at 0.5)
vp <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vm <- data.frame(matrix(ncol=ntrials,nrow=nsub))

vp[1] <- 0.5
vm[1] <- 0.5

# prediction error 

dp <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub)) 
dm <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub)) 

  
```

##### Simulate ratings

Use our extracted parameters in place of estimating the same. Use the stan syntax


######  Populate our vplus and delta frames

use the alpha parameters we've extracted (alpha_est)
d == delta (precdiction error)
v == value (i.e. value for each stimulus)


```{r}

for (p in 1:nsub){
  for (t in 1:(ntrials-1)){
      dp[p,t] <- screamPlus[p,t]-vp[p,t]
      dm[p,t] <- screamMinus[p,t]-vm[p,t]
      vp[p,t+1] <- vp[p,t]+alpha_est[p,1]*dp[p,t]
      vm[p,t+1]<- vm[p,t]+alpha_est[p,1]*dm[p,t]
    }
}

```

    for (t in 1:ntrials){
      shape1_Plus[t,p] = VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
      shape1_Minus[t,p] = VMinus[t,p] * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
      shape2_Plus[t,p] = (1-VPlus[t,p]) * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
      shape2_Minus[t,p] = (1-VMinus[t,p]) * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);

      ratingsPlus[t,p] ~ beta(shape1_Plus[t,p],shape2_Plus[t,p]);
      ratingsMinus[t,p] ~ beta(shape1_Minus[t,p],shape2_Minus[t,p]);
    }
  }
}


###### populate beta parameter shape frames

Use the new v frames and beta parameters.

Shape 1 and 2 are sufficient parameters for the beta distribution

```{r}

for (p in 1:nsub){
  
  for (t in 1:ntrials){
    
      shape1p[p,t] = vp[p,t] * ((vp[p,t] * (1-vp[p,t])) / beta_plus[p,1])
      shape1m[p,t] = vm[p,t] * ((vm[p,t] * (1-vm[p,t])) / beta_minus[p,1])
      shape2p[p,t] = (1-vp[p,t]) * ((vp[p,t] * (1-vp[p,t])) / beta_plus[p,1])
      shape2m[p,t] = (1-vm[p,t]) * ((vm[p,t] * (1-vm[p,t])) / beta_minus[p,1])
      
  }
}
  

```


###### Estimate ratings


trying to use pbeta here (derives the distribution function givemn a set of probabilities) 

For now, setting probabilities between 0 and 1 and taking the average...
```{r}


for (p in 1:nsub){
  
  for (t in 1:ntrials){
    
    rating_est_plus[p,t] <- mean(rbeta(1000,shape1p[p,t],shape2p[p,t]))
    rating_est_minus[p,t] <- mean(rbeta(1000,shape1m[p,t],shape2m[p,t]))
    
  }
}


```

#### Correlate actual ratings with simulated ratings

use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings...

Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.

```{r}

print("real ratings with estimated ratings: CS MINUS")
diag(corr.test(rating_est_minus,minus_scaled)$r)

print("real ratings with estimated ratings: CS MINUS (average for all trials)")
cor(rowMeans(rating_est_minus),rowMeans(minus_scaled))


print("real ratings with estimated ratings: CS PLUS")
diag(corr.test(rating_est_plus,plus_scaled)$r)

print("real ratings with estimated ratings: CS PLUS (average for all trials)")
cor(rowMeans(rating_est_plus),rowMeans(plus_scaled))


```

### Recover

Here we are seeing if we can recover the same estimates using the simulated ratings. Basically run stan but using the estimated ratings instead of the real ones. See if we get the same alpha / beta parameters.

#### rescale the estimated ratings


rescale the 1-9 expectancy values to be on a 0-1 scale.

stan cannot deal with the extreme limit of the beta, so make the rescaled limits just above 0 and below one

Note that when a value had to be imputed as it was missing it will not be an integer. Thus the function needs to allow for ranges between values.

```{r}
# 
# minus_scaled_est <- data.frame(matrix(ncol=dim(rating_est_minus)[2],nrow = dim(rating_est_minus)[1]))
# 
# ##  populate with rexcaled values
# 
# for (sub in 1:dim(rating_est_minus)[1]){
#   for (col in 1:dim(rating_est_minus)[2]){
#     
#     minus_scaled_est[sub,col] <- scale_flare(rating_est_minus[sub,col])
#   }
# }
# 
# ## ditto for plus
# 
# plus_scaled_est <- data.frame(matrix(ncol=dim(rating_est_plus)[2],nrow = dim(rating_est_plus)[1]))
# 
# ##  populate with rexcaled values
# 
# for (sub in 1:dim(rating_est_plus)[1]){
#   for (col in 1:dim(rating_est_plus)[2]){
#     
#     plus_scaled_est[sub,col] <- scale_flare(rating_est_plus[sub,col])
#   }
# }
# 


```

#### run stan model

RL model adding a beta per stimulus to Alex's model


```{r mean 2 beta recover set up}
#script
stanname='beta_meansd_2beta_RL.stan'

# data 
flare_data_rec <-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(rating_est_plus),ratingsMinus=t(rating_est_minus),cdf_scale=cdf_scale)



```


```{r mean 2 beta recover}

flare_fit_rec <- model_run('full',stanname,flare_data_rec)

## get some basic output descriptions printed to screen
out_describe(flare_fit_rec,nsub)


```


#### Make alpha / beta datasets p/p

Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.

```{r}

params_rec <- summary(flare_fit_rec)
alpha_est_rec <- data.frame(params_rec$summary[1:nsub,1])
beta_plus_rec <- data.frame(matrix(ncol = 1,nrow=nsub))
beta_minus_rec <- data.frame(matrix(ncol = 1,nrow=nsub))

names(beta_plus_rec) <- "beta_plus"
names(beta_minus_rec) <- "beta_minus"
  
subp = 0
subm = 0
  for ( i in 343:1026){
    
    if (i%%2 == 1){
      subp= subp+1
      beta_plus_rec[subp,1] <- params_rec$summary[i,1]
    } else if (i%%2 == 0) {
      subm= subm+1
      beta_minus_rec[subm,1] <- params_rec$summary[i,1] 
    }
    
}


```

#### Correlate actual ratings with simulated ratings

use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings...

Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.

```{r}

print("original with recovered: ALPHA")
diag(corr.test(alpha_est_rec,alpha_est)$r)

print("original with recovered: BETA PLUS")
diag(corr.test(beta_plus_rec,beta_plus)$r)

print("original with recovered: BETA MINUS")
diag(corr.test(beta_minus_rec,beta_minus)$r)

```


# Expanding on the best base model

# Potentially interesting parameters to add to best fit model

## Model 8: Punishment sensitivity 

How aversive they find the scream reinforcement. Modelling this on the loss aversion parameter in [Charpentier et al](https://f1000.com/work/item/6707134/resources/5864774/pdf) (see the last page before references),

This will be a single parameter per person, and represents how much the scream influences their ratings. 

Based on the paper, will try the following to model this in stan by including it in our value calcs for the CS+ and CS- respectively. we will do this by letting it influence how much their prediction error changes based on whether a scream occurred or not. The prediction error is later used to change the value rating per stimulus


$$d(stimulus,trial) = scream*\lambda-v(stimulus,trial-1)$$

where 
$$\lambda = sensitivity\\to\\screams$$ 

### Model

```{r punishment sensitivity set up}
#script
stanname='beta_mean1beta_PunSens.stan'

```


```{r punishment sensitivity}

flare_fit <- model_run('full',stanname,flare_data)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


#### Create BIC from log likelihood

```{r}
## extract log likelihood

flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)

#calculate BIC

FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)

## mean BIC as model comparisons tool:

print("Mean Bayesian information criterion for model")

mean(FLARe_bic)

```

#### Add to bar plot

```{r}

mod_comp <- rbind(na.omit(mod_comp),c("Punishment sensitivity",mean(FLARe_bic)))

mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))

mod_comp <- as.data.frame(na.omit(mod_comp))


## plot function - create plot
plot_models(mod_comp)

```


### Generate
##### Make alpha / beta datasets p/p

Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.

```{r}

params <- summary(flare_fit)
alpha_est <- data.frame(params$summary[1:nsub,1])
beta <- data.frame(params$summary[(nsub+1):(nsub*2),1])
lambda <- data.frame(params$summary[(nsub*3+1):(nsub*4),1])

names(alpha_est) <- "alpha"
names(beta) <- "beta"
names(lambda) <- "lambda"
  

```

##### Initialise empty datasets to hold the predicted ratings

```{r}

rating_est_plus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
rating_est_minus <- data.frame(matrix(ncol=ntrials,nrow=nsub))

# beta shape parameters
shape1p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape1m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2m <- data.frame(matrix(ncol=ntrials,nrow=nsub))

# V parameters (initialised at random value between 0.5 - 0.05 and 0.5+0.05)

vp <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vm <- data.frame(matrix(ncol=ntrials,nrow=nsub))

vp[1] <- rnorm(nsub,0.5,0.025)
vm[1] <- rnorm(nsub,0.5,0.025)


# prediction error 

dp <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub)) 
dm <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub)) 

  
```

##### Simulate ratings

Use our extracted parameters in place of estimating the same. Use the stan syntax


######  Populate our vplus and delta frames

use the alpha parameters we've extracted (alpha_est)
d == delta (precdiction error)
v == value (i.e. value for each stimulus)


```{r}

for (p in 1:nsub){
  for (t in 1:(ntrials-1)){
      dp[p,t] <- screamPlus[p,t]*lambda[p,]-vp[p,t]
      dm[p,t] <- screamMinus[p,t]*lambda[p,]-vm[p,t]
      vp[p,t+1] <- vp[p,t]+alpha_est[p,1]*dp[p,t]
      vm[p,t+1]<- vm[p,t]+alpha_est[p,1]*dm[p,t]
    }
}

```

    for (t in 1:ntrials){
      shape1_Plus[t,p] = VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
      shape1_Minus[t,p] = VMinus[t,p] * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
      shape2_Plus[t,p] = (1-VPlus[t,p]) * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
      shape2_Minus[t,p] = (1-VMinus[t,p]) * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);

      ratingsPlus[t,p] ~ beta(shape1_Plus[t,p],shape2_Plus[t,p]);
      ratingsMinus[t,p] ~ beta(shape1_Minus[t,p],shape2_Minus[t,p]);
    }
  }
}


###### populate beta parameter shape frames

Use the new v frames and beta parameters.

Shape 1 and 2 are sufficient parameters for the beta distribution

```{r}

for (p in 1:nsub){
  
  for (t in 1:ntrials){
    
      shape1p[p,t] = vp[p,t] * ((vp[p,t] * (1-vp[p,t])) / beta[p,1])
      shape1m[p,t] = vm[p,t] * ((vm[p,t] * (1-vm[p,t])) / beta[p,1])
      shape2p[p,t] = (1-vp[p,t]) * ((vp[p,t] * (1-vp[p,t])) / beta[p,1])
      shape2m[p,t] = (1-vm[p,t]) * ((vm[p,t] * (1-vm[p,t])) / beta[p,1])
      
  }
}
  

```


###### Estimate ratings


trying to use pbeta here (derives the distribution function givemn a set of probabilities) 

For now, setting probabilities between 0 and 1 and taking the average...
```{r}


for (p in 1:nsub){
  
  for (t in 1:ntrials){
    
    rating_est_plus[p,t] <- mean(rbeta(1000,shape1p[p,t],shape2p[p,t]))
    rating_est_minus[p,t] <- mean(rbeta(1000,shape1m[p,t],shape2m[p,t]))
    
  }
}


```

#### Correlate actual ratings with simulated ratings

use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings...

Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.

```{r}

print("real ratings with estimated ratings: CS MINUS")
diag(corr.test(rating_est_minus,minus_scaled)$r)

print("real ratings with estimated ratings: CS MINUS (average for all trials)")
cor(rowMeans(rating_est_minus),rowMeans(minus_scaled))


print("real ratings with estimated ratings: CS PLUS")
diag(corr.test(rating_est_plus,plus_scaled)$r)

print("real ratings with estimated ratings: CS PLUS (average for all trials)")
cor(rowMeans(rating_est_plus),rowMeans(plus_scaled))


```

### Recover

Here we are seeing if we can recover the same estimates using the simulated ratings. Basically run stan but using the estimated ratings instead of the real ones. See if we get the same alpha / beta parameters.

#### run stan model

```{r punishment sensitivity recover set up}
#script
stanname='beta_mean1beta_PunSens.stan'

# data 
flare_data_rec<-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(rating_est_plus),ratingsMinus=t(rating_est_minus),cdf_scale=cdf_scale)
```


```{r punishment sensitivity recover}

flare_fit_rec <- model_run('full',stanname,flare_data_rec)

## get some basic output descriptions printed to screen
out_describe(flare_fit_rec,nsub)

```


#### Make alpha / beta datasets p/p

Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.

```{r}

params_rec <- summary(flare_fit_rec)
alpha_est_rec <- data.frame(params_rec$summary[1:nsub,1])
beta_rec <- data.frame(params_rec$summary[(nsub+1):(nsub*2),1])
lambda_rec <- data.frame(params_rec$summary[(nsub*3+1):(nsub*4),1])


```

#### Correlate actual ratings with simulated ratings

use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings...

Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.

```{r}

print("original with recovered: ALPHA")
diag(corr.test(alpha_est_rec,alpha_est)$r)

print("original with recovered: BETA")
diag(corr.test(beta_rec,beta)$r)

print("original with recovered: LAMBDA")
diag(corr.test(lambda_rec,lambda)$r)

```

Beta is very poorly recovered here. Alpha and Lambda are recovered exceptionally well. 

Will try a quick 2 beta mdoel with punishment sensitivity to see if this improves things. 



## Model 9: Punishment sensitivity 2 beta


### Model

```{r punishment sensitivity 2 beta set up}
#script
stanname='beta_mean1beta_PunSens2Beta.stan'

```


```{r punishment sensitivity 2 beta}

flare_fit <- model_run('full',stanname,flare_data)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


#### Create BIC from log likelihood

```{r}
## extract log likelihood

flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)

#calculate BIC

FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)

## mean BIC as model comparisons tool:

print("Mean Bayesian information criterion for model")

mean(FLARe_bic)

```

#### Add to bar plot

```{r}

mod_comp <- rbind(na.omit(mod_comp),c("Punishment sensitivity 2 beta",mean(FLARe_bic)))

mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))

mod_comp <- as.data.frame(na.omit(mod_comp))


## plot function - create plot
plot_models(mod_comp)

```

### Generate
##### Make alpha / beta datasets p/p

Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.

```{r}

params <- summary(flare_fit)
alpha_est <- data.frame(params$summary[1:nsub,1])
beta_all <- data.frame(params$summary[(nsub+1):(nsub*3),1])
lambda <- data.frame(params$summary[(nsub*4+1):(nsub*5),1])

# divide beta into the two...

## p is 1, so the odd rows
beta_p <- data.frame(beta_all[ c(TRUE,FALSE), ]) # odd rows
beta_m <- data.frame(beta_all[ !c(TRUE,FALSE), ]) # even rows

names(alpha_est) <- "alpha"
names(beta_p) <- "beta_plus"
names(beta_m) <- "beta_minus"
names(lambda) <- "lambda"
  

```

##### Initialise empty datasets to hold the predicted ratings

```{r}

rating_est_plus <- data.frame(matrix(ncol=ntrials,nrow=nsub))
rating_est_minus <- data.frame(matrix(ncol=ntrials,nrow=nsub))

# beta shape parameters
shape1p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape1m <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2p <- data.frame(matrix(ncol=ntrials,nrow=nsub))
shape2m <- data.frame(matrix(ncol=ntrials,nrow=nsub))

# V parameters (initialised at random value between 0.5 - 0.05 and 0.5+0.05)

vp <- data.frame(matrix(ncol=ntrials,nrow=nsub))
vm <- data.frame(matrix(ncol=ntrials,nrow=nsub))

vp[1] <- rnorm(nsub,0.5,0.025)
vm[1] <- rnorm(nsub,0.5,0.025)


# prediction error 

dp <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub)) 
dm <- data.frame(matrix(ncol=(ntrials-1),nrow=nsub)) 

  
```

##### Simulate ratings

Use our extracted parameters in place of estimating the same. Use the stan syntax


######  Populate our vplus and delta frames

use the alpha parameters we've extracted (alpha_est)
d == delta (precdiction error)
v == value (i.e. value for each stimulus)


```{r}

for (p in 1:nsub){
  for (t in 1:(ntrials-1)){
      dp[p,t] <- screamPlus[p,t]*lambda[p,]-vp[p,t]
      dm[p,t] <- screamMinus[p,t]*lambda[p,]-vm[p,t]
      vp[p,t+1] <- vp[p,t]+alpha_est[p,1]*dp[p,t]
      vm[p,t+1]<- vm[p,t]+alpha_est[p,1]*dm[p,t]
    }
}

```

    for (t in 1:ntrials){
      shape1_Plus[t,p] = VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
      shape1_Minus[t,p] = VMinus[t,p] * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);
      shape2_Plus[t,p] = (1-VPlus[t,p]) * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]);
      shape2_Minus[t,p] = (1-VMinus[t,p]) * ((VMinus[t,p] * (1-VMinus[t,p])) / beta[p,2]);

      ratingsPlus[t,p] ~ beta(shape1_Plus[t,p],shape2_Plus[t,p]);
      ratingsMinus[t,p] ~ beta(shape1_Minus[t,p],shape2_Minus[t,p]);
    }
  }
}


###### populate beta parameter shape frames

Use the new v frames and beta parameters.

Shape 1 and 2 are sufficient parameters for the beta distribution

```{r}

for (p in 1:nsub){
  
  for (t in 1:ntrials){
    
      shape1p[p,t] = vp[p,t] * ((vp[p,t] * (1-vp[p,t])) / beta_p[p,1])
      shape2p[p,t] = (1-vp[p,t]) * ((vp[p,t] * (1-vp[p,t])) / beta_p[p,1])
      shape1m[p,t] = vm[p,t] * ((vm[p,t] * (1-vm[p,t])) / beta_m[p,1])
      shape2m[p,t] = (1-vm[p,t]) * ((vm[p,t] * (1-vm[p,t])) / beta_m[p,1])
      
  }
}
  

```


###### Estimate ratings


trying to use pbeta here (derives the distribution function givemn a set of probabilities) 

For now, setting probabilities between 0 and 1 and taking the average...
```{r}


for (p in 1:nsub){
  
  for (t in 1:ntrials){
    
    rating_est_plus[p,t] <- mean(rbeta(1000,shape1p[p,t],shape2p[p,t]))
    rating_est_minus[p,t] <- mean(rbeta(1000,shape1m[p,t],shape2m[p,t]))
    
  }
}


```

#### Correlate actual ratings with simulated ratings

use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings...

Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.

```{r}

print("real ratings with estimated ratings: CS MINUS")
diag(corr.test(rating_est_minus,minus_scaled)$r)

print("real ratings with estimated ratings: CS MINUS (average for all trials)")
cor(rowMeans(rating_est_minus),rowMeans(minus_scaled))


print("real ratings with estimated ratings: CS PLUS")
diag(corr.test(rating_est_plus,plus_scaled)$r)

print("real ratings with estimated ratings: CS PLUS (average for all trials)")
cor(rowMeans(rating_est_plus),rowMeans(plus_scaled))


```


The generating ratings are worse with this model overall.

### Recover

Here we are seeing if we can recover the same estimates using the simulated ratings. Basically run stan but using the estimated ratings instead of the real ones. See if we get the same alpha / beta parameters.

#### run stan model

```{r punishment sensitivity 2beta recover set up}
#script
stanname='beta_mean1beta_PunSens2Beta.stan'

# data 
flare_data_rec<-list(ntrials=ntrials,nsub=nsub,screamPlus = t(screamPlus), screamMinus= t(screamMinus),ratingsPlus=t(rating_est_plus),ratingsMinus=t(rating_est_minus),cdf_scale=cdf_scale)
```


```{r punishment sensitivity 2 beta recover}

flare_fit_rec <- model_run('full',stanname,flare_data_rec)

## get some basic output descriptions printed to screen
out_describe(flare_fit_rec,nsub)

```


#### Make alpha / beta datasets p/p

Use the summary of the stan model to extract the different parameters we want to try to use to recreate our data.

```{r}


params_rec <- summary(flare_fit_rec)
alpha_est_rec <- data.frame(params_rec$summary[1:nsub,1])
beta_all_rec <- data.frame(params_rec$summary[(nsub+1):(nsub*3),1])
lambda_rec <- data.frame(params_rec$summary[(nsub*4+1):(nsub*5),1])

# divide beta into the two...

## p is 1, so the odd rows
beta_p_rec <- data.frame(beta_all_rec[ c(TRUE,FALSE), ]) # odd rows
beta_m_rec <- data.frame(beta_all_rec[ !c(TRUE,FALSE), ]) # even rows

names(alpha_est_rec) <- "alpha"
names(beta_p_rec) <- "beta_plus"
names(beta_m_rec) <- "beta_minus"
names(lambda_rec) <- "lambda"
  

```

#### Correlate actual ratings with simulated ratings

use the simulated ratings per person that we have derived using our parameters and see how well they align with the real ratings...

Only showing the diaganols from corr.test package here to get the important t1 x t1 etc values.

```{r}

print("original with recovered: ALPHA")
diag(corr.test(alpha_est_rec,alpha_est)$r)

print("original with recovered: BETA PLUS")
diag(corr.test(beta_p_rec,beta_p)$r)

print("original with recovered: BETA MINUS")
diag(corr.test(beta_m_rec,beta_m)$r)

print("original with recovered: LAMBDA")
diag(corr.test(lambda_rec,lambda)$r)

```

Interestingly this is much worse for the lambda. Also not great for beta. Only alpha remains ok. So overall the 1 beta is probably better even though the BIC is *slightly* worse.

# Dual learning 

## Model 10: dual learning + punishment sensitivity ( 1 beta)

a la Toby paper - dual learning model.

Allows the stimuli to update based on the other stimulus outcomes.

This version also includes the punishment sensitivity multiplier as this was best so far


### Model

```{r dual learn set up}
#script
stanname='beta_DualLearn_PunSens_1beta.stan'

```


```{r dual learn}

flare_fit <- model_run('min',stanname,flare_data)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


#### Create BIC from log likelihood

```{r}
## extract log likelihood

flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)

#calculate BIC

FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)

## mean BIC as model comparisons tool:

print("Mean Bayesian information criterion for model")

mean(FLARe_bic)

```

#### Add to bar plot

```{r}

mod_comp <- rbind(na.omit(mod_comp),c("Dual Learning",mean(FLARe_bic)))

mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))

mod_comp <- as.data.frame(na.omit(mod_comp))


## plot function - create plot
plot_models(mod_comp)

```



# Rating consistency

A parameter that represents the rating consistency for multiple repeated / similar trials. I think it would be best to have one each for the CS+ and CS- given these differ in terms of how similar the trials are (CS- is always un-enforced for example). Can imagine consistency is a parameter that is concistent regardless of reinforcement / stimulus type though, especially in later phases. So worth testing both models.

A similar parameter is [used in the charpentier et al. paper](https://f1000.com/work/item/6707134/resources/5864774/pdf) (see the last page before the references).

We will estimate this parameter as a factor that influences the overall shape of the choice probability distribution (beta distributiuon). It will do this via the sufficient parameters that are influences by stimulus value etc per trial.

**note very unsure about this - need to check it out with Alex**

$$shape1(stimulus) = (1 + exp(-\mu * VPlus[t,p] * ((VPlus[t,p] * (1-VPlus[t,p])) / beta[p,1]))^-1 $$
where $$\mu = logit\\sensitivty$$

Where logit sensitivity effectively means consistency of rating consistency; higher valued should mean greater consistency.


### Model

```{r consistency set up}
#script
stanname='beta_mean1beta_Consistency.stan'

```


```{r consistency}

flare_fit <- model_run('min',stanname,flare_data)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


BIC is terrible


#### Create BIC from log likelihood

```{r}
## extract log likelihood

flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)

#calculate BIC

FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)

## mean BIC as model comparisons tool:

print("Mean Bayesian information criterion for model")
mean(FLARe_bic)

```

#### Add to bar plot

```{r}

mod_comp <- rbind(na.omit(mod_comp),c("Consistency",mean(FLARe_bic)))


mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))

mod_comp <- as.data.frame(na.omit(mod_comp))


## plot function - create plot
plot_models(mod_comp)

```


# Generalisation

# NOTE TO SELF::

maybe generalise the prediction error NOT the value (so add some od deltaP to the VMinus calculation....)

## some intro

Basically here we want to capture a parameter that estimates how much the learning from the reinforced stimulus influences responses to the 'safe' stimulus.

Basing my first effort on [Norbury, Robbins & Seymour, 2018](https://f1000.com/work/item/5243419/resources/6038039/pdf), and their finding that there is generalisation based on value and perceptual processes.

From their abstract 

>> "We found that generalization of avoidance could be parsed into perceptual and value-based processes, and further, that value-based generalization could be subdivided into that relating to aversive and neutral feedback".... "Further, generalization from aversive, but not neutral, feedback was associated with self-reported anxiety and intrusive thoughts. These results reveal a set of distinct mechanisms that mediate generalization in avoidance learning, and show how specific individual differences within them can yield anxiety. "

from introduction

>> "It is therefore possible that under-generalization of safety cues, as opposed to over-generalization of aversive cues, might be a contributing factor to susceptibility to disorders such as generalized anxiety"


## Equations

I will basically replicate the visual and value generalisation equations.

### Visual (i.e. possible identify confusion over the two stimuli):

$$V_c,_m =0.80*V_C,_m + 0.20*V_C,_p$$




### Value:

$$G_s = 1/exp(\rho_o)^2 / 2*\delta))$$
where $_s$ is current stimulus, and $_o$ is the other stimulus. $\rho$ is the parameter governing shape 'spikiness'" $\delta$ is a free parameter that governs the width of the Gaussian function that governs generalisation. This parameter should probably differ depending on trial outcome (scream or neutral) $\delta_s$ and $\delta_n$. 

They authors update their value by multiplying it by the generalisation of the current 'state' (stimulus). i.e. $*G_s$ as the last term. 

First we will use a single $\delta$ value, rather than updating per trial depending on if it is scream + or -


So, for ours the following will be added to the punishment sensitivity single beta model.

### One generalisation parameter

Generalisation:

$$G = 1/exp((\rho_m - \rho_p)^2 / (2*\delta^2))$$

### generalisation per stimulus

Generalisation plus:

$$G_p = 1/exp((\rho_m - \rho_p)^2 / (2*\delta^2))$$

Generalisation minus:

$$G_m = 1/exp((\rho_p - \rho_m)^2 / (2*\delta^2))$$
Value plus:

$$VPlus[t+1,p]=VPlus[t,p]+alpha[p]*PredErrorPlus[t,p]*G_p$$
Value minus:

$$VMinus[t+1,p]=VMinus[t,p]+alpha[p]*PredErrorMinus[t,p]*G_p$$


### additional things to try:

#### dependence on prediction error

$\kappa$ will be a free parameter that indicates difference in dependence on prediction error history, and will make the v updates like so:


Value plus:

$$VPlus[t+1,p]=VPlus[t,p]+\kappa*alpha[p]*PredErrorPlus[t,p]*G_p$$
Value minus:

$$VMinus[t+1,p]=VMinus[t,p]+\kappa*alpha[p]*PredErrorMinus[t,p]*G_p$$

#### updating learning rate per trial, per stimulus

the alpha would become

$$\alpha_t+1 = \eta  |  (PredError-V_p,_t)| + (1-\eta)*\alpha_t$$


## Model 11: Visual generalisation



Allow visual generalistion between to change the value of the stimuli per trial.

Multiply own value by .8 and add the other stimuli value of .2 (see equations above).

### Model

```{r vis gen set up}
#script
stanname='beta_Visual_Gen.stan'

```


```{r vis gen}

flare_fit <- model_run('min',stanname,flare_data_nolog)

## get some basic output descriptions printed to screen
out_describe(flare_fit,nsub)

# extract fit data
summary_flare <- summary(flare_fit)

```


#### Create BIC from log likelihood

```{r}
## extract log likelihood

flare_loglike <- extract_log_lik(flare_fit, parameter_name = "loglik", merge_chains = TRUE)

#calculate BIC

FLARe_bic<-bic(ntrials,-colMeans(flare_loglike),2) #number of parameters in that model e.g. 4)

## mean BIC as model comparisons tool:

print("Mean Bayesian information criterion for model")

mean(FLARe_bic)

```

#### Add to bar plot

```{r}

mod_comp <- rbind(na.omit(mod_comp),c("Visual Generalisation",mean(FLARe_bic)))

mod_comp$BIC <- odp(as.numeric(mod_comp$BIC))

mod_comp <- as.data.frame(na.omit(mod_comp))


## plot function - create plot
plot_models(mod_comp)

```

# avoidance

Usinf volume per trial as an avoidance outcome and modelling as per [Norbury et al](https://f1000.com/work/item/5243419/resources/6038039/pdf) in a sort of diffusion drift way. Basically updating value of avoid or not avoiding where avoid = = volume down per trial.

Might want to use a Pearce-Hall associatbility rule to update the learning rate  here...

>> "According to this rule, the learning rate on each trial is determined by the absolute magnitude of past prediction errors, such that state-action value estimates are updated by more when previous outcomes have been more surprising, and by less when they were less surprising. This allows for learning in terms of modelled value adjustment to be greater when outcomes are more surprising (e.g. at the start of the task), but to be lesser (leading to more stable values) when outcomes are better predicted. A non-constant learning rate also ensures that parameters governing width of value-based generalization, which are assumed to be constant over the course of the task, are identifiable during parameter estimation (see below equations)."


# Forgetting

this parameter is how much they retain what they learned over previous trials and use it to inform the current rating.




# to do

*investigte change point detection parameters (when reinforcement changes - i.e. moving acquisition to extinction)
  * could do this or model the phases separately - check which best fits
  
*add priors! These are what you expect the group to look like (i.e alpha is normally distributed around a mean of 0.5 with variance of 10 or something)
*LOOKUP R stan choice of priors.
* can have informative or uninformative priors (i.e. agnostic or not)

* Avoidance as volume reductions

* Add parameters

* DO NOT FORGET TO MAKE SURE WE HAVE AN ACCURATE SCREAM PATTERN PER PERSON FOR CS+ IN ACQUISITION



# Push any updates to github 

##### Any push the updates to github

Unhash the series below if you made any changes.


```{bash}

## initialise bash directory and filename
stanname="punish_only.stan"
scriptdir="/Users/kirstin/Dropbox/SGDP/FLARe/FLARe_MASTER/Projects/Hierachal_modelling/Scripts"


## stage
#git add $scriptdir/$stanname


## push 

#git push Bayes_modelling 


```






